thr3ads.net - Gluster users - [Gluster-users] I/O error on replicated volume [Mar 2015]

If this information is useful, please help other people find it:
Share via:

Jonathan Heese

2015-Mar-27 05:34 UTC

[Gluster-users] I/O error on replicated volume

On Mar 27, 2015, at 1:29 AM, "Mohammed Rafi K C" <rkavunga at
redhat.com<mailto:rkavunga at redhat.com>> wrote:


When we change the transport from x to y, it should reflect in all the vol
files. But unfortunately, the volume set command failed to change in nfs server,
(of course it is a bug).  I had clearly mentioned in my previous mails, that
changing the volume files using the volume set command is not recommended, i
suggested this, just to check whether tcp work fine or not.

The reason why you are getting rdma connection error is because , now bricks are
running through tcp, so the brick process will be listening on socket port. But
nfs-server asked for an rdma connection, so they are trying to connect from rdma
port to tcp port. Obviously the connection will be rejected.

Okay, thanks for the thorough explanation there.

Now that we know that TCP does function without the original I/O errors (from
the start of this thread), how do you suggest that I proceed?

Do I have to wait for a subsequent release to rid myself of this bug?

Would it be feasible for me to switch from RDMA to TCP in a more permanent
fashion (maybe wipe the cluster and start over?)?

Thanks.

Regards,
Jon Heese

Regards
Rafi KC

On 03/27/2015 12:28 AM, Jonathan Heese wrote:

Rafi,


Here is my nfs-server.vol file:


[root at duke ~]# cat /var/lib/glusterd/nfs/nfs-server.vol
volume gluster_disk-client-0
    type protocol/client
    option send-gids true
    option password 562ab460-7754-4b5a-82e6-18ed6c130786
    option username ad5d5754-cf02-4b96-9f85-ff3129ae0405
    option transport-type rdma
    option remote-subvolume /bricks/brick1
    option remote-host duke-ib
end-volume

volume gluster_disk-client-1
    type protocol/client
    option send-gids true
    option password 562ab460-7754-4b5a-82e6-18ed6c130786
    option username ad5d5754-cf02-4b96-9f85-ff3129ae0405
    option transport-type rdma
    option remote-subvolume /bricks/brick1
    option remote-host duchess-ib
end-volume

volume gluster_disk-replicate-0
    type cluster/replicate
    subvolumes gluster_disk-client-0 gluster_disk-client-1
end-volume

volume gluster_disk-dht
    type cluster/distribute
    subvolumes gluster_disk-replicate-0
end-volume

volume gluster_disk-write-behind
    type performance/write-behind
    subvolumes gluster_disk-dht
end-volume

volume gluster_disk
    type debug/io-stats
    option count-fop-hits off
    option latency-measurement off
    subvolumes gluster_disk-write-behind
end-volume

volume nfs-server
    type nfs/server
    option nfs3.gluster_disk.volume-id 2307a5a8-641e-44f4-8eaf-7cc2b704aafd
    option rpc-auth.addr.gluster_disk.allow *
    option nfs.drc off
    option nfs.nlm on
    option nfs.dynamic-volumes on
    subvolumes gluster_disk
end-volume


I see that "transport-type rdma" is listed a couple times here, but
"gluster volume info" indicates that the volume is using the tcp
transport:


[root at duke ~]# gluster volume info gluster_disk

Volume Name: gluster_disk
Type: Replicate
Volume ID: 2307a5a8-641e-44f4-8eaf-7cc2b704aafd
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: duke-ib:/bricks/brick1
Brick2: duchess-ib:/bricks/brick1
Options Reconfigured:
config.transport: tcp


Please let me know if you need any further information from me to determine how
to correct this discrepancy.


Also, I feel compelled to ask: Since the TCP connections are going over the
InfiniBand connections between the two Gluster servers (based on the hostnames
which are pointed to the IB IPs via hosts files), are there any (significant)
drawbacks to using TCP instead of RDMA here?  Thanks.


Regards,

Jon Heese


________________________________
From: Mohammed Rafi K C <rkavunga at redhat.com><mailto:rkavunga at
redhat.com>
Sent: Monday, March 23, 2015 3:29 AM
To: Jonathan Heese
Cc: gluster-users
Subject: Re: [Gluster-users] I/O error on replicated volume


On 03/23/2015 11:28 AM, Jonathan Heese wrote:
On Mar 23, 2015, at 1:20 AM, "Mohammed Rafi K C" <rkavunga at
redhat.com<mailto:rkavunga at redhat.com>> wrote:


On 03/21/2015 07:49 PM, Jonathan Heese wrote:

Mohamed,


I have completed the steps you suggested (unmount all, stop the volume, set the
config.transport to tcp, start the volume, mount, etc.), and the behavior has
indeed changed.


[root at duke ~]# gluster volume info

Volume Name: gluster_disk
Type: Replicate
Volume ID: 2307a5a8-641e-44f4-8eaf-7cc2b704aafd
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: duke-ib:/bricks/brick1
Brick2: duchess-ib:/bricks/brick1
Options Reconfigured:
config.transport: tcp

[root at duke ~]# gluster volume status
Status of volume: gluster_disk
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick duke-ib:/bricks/brick1                            49152   Y       16362
Brick duchess-ib:/bricks/brick1                         49152   Y       14155
NFS Server on localhost                                 2049    Y       16374
Self-heal Daemon on localhost                           N/A     Y       16381
NFS Server on duchess-ib                                2049    Y       14167
Self-heal Daemon on duchess-ib                          N/A     Y       14174

Task Status of Volume gluster_disk
------------------------------------------------------------------------------
There are no active volume tasks


I am no longer seeing the I/O errors during prolonged periods of write I/O that
I was seeing when the transport was set to rdma. However, I am seeing this
message on both nodes every 3 seconds (almost exactly):


==> /var/log/glusterfs/nfs.log <=[2015-03-21 14:17:40.379719] W
[rdma.c:1076:gf_rdma_cm_event_handler] 0-gluster_disk-client-1: cma event
RDMA_CM_EVENT_REJECTED, error 8 (me:10.10.10.1:1023 peer:10.10.10.2:49152)


Is this something to worry about?

If you are not using nfs to export the volumes, there is nothing to worry.

I'm using the native glusterfs FUSE component to mount the volume locally on
both servers -- I assume that you're referring to the standard NFS protocol
stuff, which I'm not using here.

Incidentally, I would like to keep my logs from filling up with junk if
possible.  Is there something I can do to get rid of these (useless?) error
messages?

If i understand correctly, you are getting this enormous log message from nfs
log only, all other logs and everything are fine now, right ? If that is the
case, and you are not at all using nfs for exporting the volume, as  a
workaround you can disable nfs for your volume or cluster. (gluster v set
nfs.disable on). This will turnoff your gluster nfs server, and you will no
longer get those log messages.



Any idea why there are rdma pieces in play when I've set my transport to
tcp?

there should not be any piece of rdma,if possible, can you paste the volfile for
nfs server. You can find the volfile in /var/lib/glusterd/nfs/nfs-server.vol or
/usr/local/var/lib/glusterd/nfs/nfs-server.vol

I will get this for you when I can.  Thanks.

If you can make it, that will be great help to understand the problem.


Rafi KC


Regards,
Jon Heese

Rafi KC

The actual I/O appears to be handled properly and I've seen no further
errors in the testing I've done so far.


Thanks.


Regards,

Jon Heese


________________________________
From: gluster-users-bounces at gluster.org<mailto:gluster-users-bounces at
gluster.org> <gluster-users-bounces at
gluster.org><mailto:gluster-users-bounces at gluster.org> on behalf of
Jonathan Heese <jheese at inetu.net><mailto:jheese at inetu.net>
Sent: Friday, March 20, 2015 7:04 AM
To: Mohammed Rafi K C
Cc: gluster-users
Subject: Re: [Gluster-users] I/O error on replicated volume

Mohammed,

Thanks very much for the reply.  I will try that and report back.

Regards,
Jon Heese

On Mar 20, 2015, at 3:26 AM, "Mohammed Rafi K C" <rkavunga at
redhat.com<mailto:rkavunga at redhat.com>> wrote:


On 03/19/2015 10:16 PM, Jonathan Heese wrote:
Hello all,

Does anyone else have any further suggestions for troubleshooting this?

To sum up: I have a 2 node 2 brick replicated volume, which holds a handful of
iSCSI image files which are mounted and served up by tgtd (CentOS 6) to a
handful of devices on a dedicated iSCSI network.  The most important iSCSI
clients (initiators) are four VMware ESXi 5.5 hosts that use the iSCSI volumes
as backing for their datastores for virtual machine storage.

After a few minutes of sustained writing to the volume, I am seeing a massive
flood (over 1500 per second at times) of this error in
/var/log/glusterfs/mnt-gluster-disk.log:
[2015-03-16 02:24:07.582801] W [fuse-bridge.c:2242:fuse_writev_cbk]
0-glusterfs-fuse: 635358: WRITE => -1 (Input/output error)

When this happens, the ESXi box fails its write operation and returns an error
to the effect of ?Unable to write data to datastore?.  I don?t see anything else
in the supporting logs to explain the root cause of the i/o errors.

Any and all suggestions are appreciated.  Thanks.

>From the mount logs, i assume that your volume transport type is rdma. There
are some known issues for rdma in 3.5.3, and the patch for to address those
issues are already send to upstream [1]. From the logs, I'm not sure and it
is hard to tell you whether this problem is something related to rdma transport
or not. To make sure that the tcp transport is works well in this scenario, if
possible can you try to reproduce the same using tcp type volumes. You can
change the transport type of volume by doing the following step ( not
recommended in normal use case).
1) unmount every client
2) stop the volume
3) run gluster volume set volname config.transport tcp
4) start the volume again
5) mount the clients

[1] : http://goo.gl/2PTL61

Regards
Rafi KC

Jon Heese
Systems Engineer
INetU Managed Hosting
P: 610.266.7441 x 261
F: 610.266.7434
www.inetu.net<https://www.inetu.net/>
** This message contains confidential information, which also may be privileged,
and is intended only for the person(s) addressed above. Any unauthorized use,
distribution, copying or disclosure of confidential and/or privileged
information is strictly prohibited. If you have received this communication in
error, please erase all copies of the message and its attachments and notify the
sender immediately via reply e-mail. **

From: Jonathan Heese
Sent: Tuesday, March 17, 2015 12:36 PM
To: 'Ravishankar N'; gluster-users at
gluster.org<mailto:gluster-users at gluster.org>
Subject: RE: [Gluster-users] I/O error on replicated volume

Ravi,

The last lines in the mount log before the massive vomit of I/O errors are from
22 minutes prior, and seem innocuous to me:

[2015-03-16 01:37:07.126340] E
[client-handshake.c:1760:client_query_portmap_cbk] 0-gluster_disk-client-0:
failed to get the port number for remote subvolume. Please run 'gluster
volume status' on server to see if brick process is running.
[2015-03-16 01:37:07.126587] W [rdma.c:4273:gf_rdma_disconnect]
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f) [0x7fd9c557bccf]
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7fd9c557a995]
(-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
[0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0: disconnect called
(peer:10.10.10.1:24008)
[2015-03-16 01:37:07.126687] E
[client-handshake.c:1760:client_query_portmap_cbk] 0-gluster_disk-client-1:
failed to get the port number for remote subvolume. Please run 'gluster
volume status' on server to see if brick process is running.
[2015-03-16 01:37:07.126737] W [rdma.c:4273:gf_rdma_disconnect]
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f) [0x7fd9c557bccf]
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7fd9c557a995]
(-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
[0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1: disconnect called
(peer:10.10.10.2:24008)
[2015-03-16 01:37:10.730165] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
0-gluster_disk-client-0: changing port to 49152 (from 0)
[2015-03-16 01:37:10.730276] W [rdma.c:4273:gf_rdma_disconnect]
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f) [0x7fd9c557bccf]
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7fd9c557a995]
(-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
[0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0: disconnect called
(peer:10.10.10.1:24008)
[2015-03-16 01:37:10.739500] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
0-gluster_disk-client-1: changing port to 49152 (from 0)
[2015-03-16 01:37:10.739560] W [rdma.c:4273:gf_rdma_disconnect]
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f) [0x7fd9c557bccf]
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7fd9c557a995]
(-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
[0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1: disconnect called
(peer:10.10.10.2:24008)
[2015-03-16 01:37:10.741883] I
[client-handshake.c:1677:select_server_supported_programs]
0-gluster_disk-client-0: Using Program GlusterFS 3.3, Num (1298437), Version
(330)
[2015-03-16 01:37:10.744524] I [client-handshake.c:1462:client_setvolume_cbk]
0-gluster_disk-client-0: Connected to 10.10.10.1:49152, attached to remote
volume '/bricks/brick1'.
[2015-03-16 01:37:10.744537] I [client-handshake.c:1474:client_setvolume_cbk]
0-gluster_disk-client-0: Server and Client lk-version numbers are not same,
reopening the fds
[2015-03-16 01:37:10.744566] I [afr-common.c:4267:afr_notify]
0-gluster_disk-replicate-0: Subvolume 'gluster_disk-client-0' came back
up; going online.
[2015-03-16 01:37:10.744627] I
[client-handshake.c:450:client_set_lk_version_cbk] 0-gluster_disk-client-0:
Server lk version = 1
[2015-03-16 01:37:10.753037] I
[client-handshake.c:1677:select_server_supported_programs]
0-gluster_disk-client-1: Using Program GlusterFS 3.3, Num (1298437), Version
(330)
[2015-03-16 01:37:10.755657] I [client-handshake.c:1462:client_setvolume_cbk]
0-gluster_disk-client-1: Connected to 10.10.10.2:49152, attached to remote
volume '/bricks/brick1'.
[2015-03-16 01:37:10.755676] I [client-handshake.c:1474:client_setvolume_cbk]
0-gluster_disk-client-1: Server and Client lk-version numbers are not same,
reopening the fds
[2015-03-16 01:37:10.761945] I [fuse-bridge.c:5016:fuse_graph_setup] 0-fuse:
switched to graph 0
[2015-03-16 01:37:10.762144] I
[client-handshake.c:450:client_set_lk_version_cbk] 0-gluster_disk-client-1:
Server lk version = 1
[2015-03-16 01:37:10.762279] I [fuse-bridge.c:3953:fuse_init] 0-glusterfs-fuse:
FUSE inited with protocol versions: glusterfs 7.22 kernel 7.14
[2015-03-16 01:59:26.098670] W [fuse-bridge.c:2242:fuse_writev_cbk]
0-glusterfs-fuse: 292084: WRITE => -1 (Input/output error)
?

I?ve seen no indication of split-brain on any files at any point in this (ever
since downdating from 3.6.2 to 3.5.3, which is when this particular issue
started):
[root at duke gfapi-module-for-linux-target-driver-]# gluster v heal
gluster_disk info
Brick duke.jonheese.local:/bricks/brick1/
Number of entries: 0

Brick duchess.jonheese.local:/bricks/brick1/
Number of entries: 0

Thanks.

Jon Heese
Systems Engineer
INetU Managed Hosting
P: 610.266.7441 x 261
F: 610.266.7434
www.inetu.net<https://www.inetu.net/>
** This message contains confidential information, which also may be privileged,
and is intended only for the person(s) addressed above. Any unauthorized use,
distribution, copying or disclosure of confidential and/or privileged
information is strictly prohibited. If you have received this communication in
error, please erase all copies of the message and its attachments and notify the
sender immediately via reply e-mail. **

From: Ravishankar N [mailto:ravishankar at redhat.com]
Sent: Tuesday, March 17, 2015 12:35 AM
To: Jonathan Heese; gluster-users at gluster.org<mailto:gluster-users at
gluster.org>
Subject: Re: [Gluster-users] I/O error on replicated volume


On 03/17/2015 02:14 AM, Jonathan Heese wrote:
Hello,

So I resolved my previous issue with split-brains and the lack of self-healing
by dropping my installed glusterfs* packages from 3.6.2 to 3.5.3, but now
I've picked up a new issue, which actually makes normal use of the volume
practically impossible.

A little background for those not already paying close attention:
I have a 2 node 2 brick replicating volume whose purpose in life is to hold
iSCSI target files, primarily for use to provide datastores to a VMware ESXi
cluster.  The plan is to put a handful of image files on the Gluster volume,
mount them locally on both Gluster nodes, and run tgtd on both, pointed to the
image files on the mounted gluster volume. Then the ESXi boxes will use
multipath (active/passive) iSCSI to connect to the nodes, with automatic
failover in case of planned or unplanned downtime of the Gluster nodes.

In my most recent round of testing with 3.5.3, I'm seeing a massive failure
to write data to the volume after about 5-10 minutes, so I've simplified the
scenario a bit (to minimize the variables) to: both Gluster nodes up, only one
node (duke) mounted and running tgtd, and just regular (single path) iSCSI from
a single ESXi server.

About 5-10 minutes into migration a VM onto the test datastore,
/var/log/messages on duke gets blasted with a ton of messages exactly like this:
Mar 15 22:24:06 duke tgtd: bs_rdwr_request(180) io error 0x1781e00 2a -1 512
22971904, Input/output error

And /var/log/glusterfs/mnt-gluster_disk.log gets blased with a ton of messages
exactly like this:
[2015-03-16 02:24:07.572279] W [fuse-bridge.c:2242:fuse_writev_cbk]
0-glusterfs-fuse: 635299: WRITE => -1 (Input/output error)


Are there any messages in the mount log from AFR about split-brain just before
the above line appears?
Does `gluster v heal <VOLNAME> info` show any files? Performing I/O on
files that are in split-brain fail with EIO.

-Ravi

And the write operation from VMware's side fails as soon as these messages
start.

I don't see any other errors (in the log files I know of) indicating the
root cause of these i/o errors.  I'm sure that this is not enough
information to tell what's going on, but can anyone help me figure out what
to look at next to figure this out?

I've also considered using Dan Lambright's libgfapi gluster module for
tgtd (or something similar) to avoid going through FUSE, but I'm not sure
whether that would be irrelevant to this problem, since I'm not 100% sure if
it lies in FUSE or elsewhere.

Thanks!

Jon Heese
Systems Engineer
INetU Managed Hosting
P: 610.266.7441 x 261
F: 610.266.7434
www.inetu.net<https://www.inetu.net/>
** This message contains confidential information, which also may be privileged,
and is intended only for the person(s) addressed above. Any unauthorized use,
distribution, copying or disclosure of confidential and/or privileged
information is strictly prohibited. If you have received this communication in
error, please erase all copies of the message and its attachments and notify the
sender immediately via reply e-mail. **




_______________________________________________

Gluster-users mailing list

Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>

http://www.gluster.org/mailman/listinfo/gluster-users




_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>
http://www.gluster.org/mailman/listinfo/gluster-users




-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150327/8695f68e/attachment.html>

Mohammed Rafi K C

2015-Mar-27 08:24 UTC

head link

[Gluster-users] I/O error on replicated volume

On 03/27/2015 11:04 AM, Jonathan Heese wrote:> On Mar 27, 2015, at 1:29 AM, "Mohammed Rafi K C" <rkavunga at
redhat.com
> <mailto:rkavunga at redhat.com>> wrote:
>
>>
>> When we change the transport from x to y, it should reflect in all
>> the vol files. But unfortunately, the volume set command failed to
>> change in nfs server, (of course it is a bug).  I had clearly
>> mentioned in my previous mails, that changing the volume files using
>> the volume set command is not recommended, i suggested this, just to
>> check whether tcp work fine or not.
>>
>> The reason why you are getting rdma connection error is because , now
>> bricks are running through tcp, so the brick process will be
>> listening on socket port. But nfs-server asked for an rdma
>> connection, so they are trying to connect from rdma port to tcp port.
>> Obviously the connection will be rejected.
>
> Okay, thanks for the thorough explanation there.
>
> Now that we know that TCP does function without the original I/O
> errors (from the start of this thread), how do you suggest that I proceed?
>
> Do I have to wait for a subsequent release to rid myself of this bug?
I will make sure to fix the bug. Also we can expect that rdma patches
will be merged soon.  After rdma bug's are fixed in 3.5.x , you can test
and switch to rdma.
>
> Would it be feasible for me to switch from RDMA to TCP in a more
> permanent fashion (maybe wipe the cluster and start over?)?
Either you can manually edit nfs-volfile, to change transport to "option
transport-type tcp", in all the places, then restarting nfs will solve
the problem. Or else and if possible you can start a fresh cluster
running on tcp.


Rafi>
> Thanks.
>
> Regards,
> Jon Heese
>
>> Regards
>> Rafi KC
>>  
>> On 03/27/2015 12:28 AM, Jonathan Heese wrote:
>>>
>>> Rafi,
>>>
>>>
>>> Here is my nfs-server.vol file:
>>>
>>>
>>> [root at duke ~]# cat /var/lib/glusterd/nfs/nfs-server.vol
>>> volume gluster_disk-client-0
>>>     type protocol/client
>>>     option send-gids true
>>>     option password 562ab460-7754-4b5a-82e6-18ed6c130786
>>>     option username ad5d5754-cf02-4b96-9f85-ff3129ae0405
>>>     option transport-type rdma
>>>     option remote-subvolume /bricks/brick1
>>>     option remote-host duke-ib
>>> end-volume
>>>
>>> volume gluster_disk-client-1
>>>     type protocol/client
>>>     option send-gids true
>>>     option password 562ab460-7754-4b5a-82e6-18ed6c130786
>>>     option username ad5d5754-cf02-4b96-9f85-ff3129ae0405
>>>     option transport-type rdma
>>>     option remote-subvolume /bricks/brick1
>>>     option remote-host duchess-ib
>>> end-volume
>>>
>>> volume gluster_disk-replicate-0
>>>     type cluster/replicate
>>>     subvolumes gluster_disk-client-0 gluster_disk-client-1
>>> end-volume
>>>
>>> volume gluster_disk-dht
>>>     type cluster/distribute
>>>     subvolumes gluster_disk-replicate-0
>>> end-volume
>>>
>>> volume gluster_disk-write-behind
>>>     type performance/write-behind
>>>     subvolumes gluster_disk-dht
>>> end-volume
>>>
>>> volume gluster_disk
>>>     type debug/io-stats
>>>     option count-fop-hits off
>>>     option latency-measurement off
>>>     subvolumes gluster_disk-write-behind
>>> end-volume
>>>
>>> volume nfs-server
>>>     type nfs/server
>>>     option nfs3.gluster_disk.volume-id
>>> 2307a5a8-641e-44f4-8eaf-7cc2b704aafd
>>>     option rpc-auth.addr.gluster_disk.allow *
>>>     option nfs.drc off
>>>     option nfs.nlm on
>>>     option nfs.dynamic-volumes on
>>>     subvolumes gluster_disk
>>> end-volume
>>>
>>> I see that "transport-type rdma" is listed a couple times
here, but
>>> "gluster volume info" indicates that the volume is using
the tcp
>>> transport:
>>>
>>>
>>> [root at duke ~]# gluster volume info gluster_disk
>>>
>>> Volume Name: gluster_disk
>>> Type: Replicate
>>> Volume ID: 2307a5a8-641e-44f4-8eaf-7cc2b704aafd
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: duke-ib:/bricks/brick1
>>> Brick2: duchess-ib:/bricks/brick1
>>> Options Reconfigured:
>>> config.transport: tcp
>>>
>>> Please let me know if you need any further information from me to
>>> determine how to correct this discrepancy.
>>>
>>>
>>> Also, I feel compelled to ask: Since the TCP connections are going
>>> over the InfiniBand connections between the two Gluster servers
>>> (based on the hostnames which are pointed to the IB IPs via hosts
>>> files), are there any (significant) drawbacks to using TCP instead
>>> of RDMA here?  Thanks.
>>>
>>>
>>> Regards,
>>>
>>> Jon Heese
>>>
>>>
>>>
------------------------------------------------------------------------
>>> *From:* Mohammed Rafi K C <rkavunga at redhat.com>
>>> *Sent:* Monday, March 23, 2015 3:29 AM
>>> *To:* Jonathan Heese
>>> *Cc:* gluster-users
>>> *Subject:* Re: [Gluster-users] I/O error on replicated volume
>>>  
>>>
>>> On 03/23/2015 11:28 AM, Jonathan Heese wrote:
>>>> On Mar 23, 2015, at 1:20 AM, "Mohammed Rafi K C"
>>>> <rkavunga at redhat.com <mailto:rkavunga at
redhat.com>> wrote:
>>>>
>>>>>
>>>>> On 03/21/2015 07:49 PM, Jonathan Heese wrote:
>>>>>>
>>>>>> Mohamed,
>>>>>>
>>>>>>
>>>>>> I have completed the steps you suggested (unmount all,
stop the
>>>>>> volume, set the config.transport to tcp, start the
volume, mount,
>>>>>> etc.), and the behavior has indeed changed.
>>>>>>
>>>>>>
>>>>>> [root at duke ~]# gluster volume info
>>>>>>
>>>>>> Volume Name: gluster_disk
>>>>>> Type: Replicate
>>>>>> Volume ID: 2307a5a8-641e-44f4-8eaf-7cc2b704aafd
>>>>>> Status: Started
>>>>>> Number of Bricks: 1 x 2 = 2
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: duke-ib:/bricks/brick1
>>>>>> Brick2: duchess-ib:/bricks/brick1
>>>>>> Options Reconfigured:
>>>>>> config.transport: tcp
>>>>>>
>>>>>>
>>>>>> [root at duke ~]# gluster volume status
>>>>>> Status of volume: gluster_disk
>>>>>> Gluster process                                        
Port
>>>>>> Online  Pid
>>>>>>
------------------------------------------------------------------------------
>>>>>> Brick duke-ib:/bricks/brick1                           
49152
>>>>>> Y       16362
>>>>>> Brick duchess-ib:/bricks/brick1                        
49152
>>>>>> Y       14155
>>>>>> NFS Server on localhost                                
2049
>>>>>> Y       16374
>>>>>> Self-heal Daemon on localhost                          
N/A
>>>>>> Y       16381
>>>>>> NFS Server on duchess-ib                               
2049
>>>>>> Y       14167
>>>>>> Self-heal Daemon on duchess-ib                         
N/A
>>>>>> Y       14174
>>>>>>
>>>>>> Task Status of Volume gluster_disk
>>>>>>
------------------------------------------------------------------------------
>>>>>> There are no active volume tasks
>>>>>>
>>>>>> I am no longer seeing the I/O errors during prolonged
periods of
>>>>>> write I/O that I was seeing when the transport was set
to rdma.
>>>>>> However, I am seeing this message on both nodes every 3
seconds
>>>>>> (almost exactly):
>>>>>>
>>>>>>
>>>>>> ==> /var/log/glusterfs/nfs.log
<=>>>>>> [2015-03-21 14:17:40.379719] W
>>>>>> [rdma.c:1076:gf_rdma_cm_event_handler]
0-gluster_disk-client-1:
>>>>>> cma event RDMA_CM_EVENT_REJECTED, error 8
(me:10.10.10.1:1023
>>>>>> peer:10.10.10.2:49152)
>>>>>>
>>>>>>
>>>>>> Is this something to worry about?
>>>>>>
>>>>> If you are not using nfs to export the volumes, there is
nothing
>>>>> to worry.
>>>>
>>>> I'm using the native glusterfs FUSE component to mount the
volume
>>>> locally on both servers -- I assume that you're referring
to the
>>>> standard NFS protocol stuff, which I'm not using here.
>>>>
>>>> Incidentally, I would like to keep my logs from filling up with
>>>> junk if possible.  Is there something I can do to get rid of
these
>>>> (useless?) error messages?
>>>
>>> If i understand correctly, you are getting this enormous log
message
>>> from nfs log only, all other logs and everything are fine now,
right
>>> ? If that is the case, and you are not at all using nfs for
>>> exporting the volume, as  a workaround you can disable nfs for your
>>> volume or cluster. (gluster v set nfs.disable on). This will
turnoff
>>> your gluster nfs server, and you will no longer get those log
messages.
>>>
>>>
>>>>>> Any idea why there are rdma pieces in play when
I've set my
>>>>>> transport to tcp?
>>>>>>
>>>>>
>>>>> there should not be any piece of rdma,if possible, can you
paste
>>>>> the volfile for nfs server. You can find the volfile in
>>>>> /var/lib/glusterd/nfs/nfs-server.vol or
>>>>> /usr/local/var/lib/glusterd/nfs/nfs-server.vol
>>>>
>>>> I will get this for you when I can.  Thanks.
>>>
>>> If you can make it, that will be great help to understand the
problem.
>>>
>>>
>>> Rafi KC
>>>
>>>>
>>>> Regards,
>>>> Jon Heese
>>>>
>>>>> Rafi KC
>>>>>>
>>>>>> The actual I/O appears to be handled properly and
I've seen no
>>>>>> further errors in the testing I've done so far.
>>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Jon Heese
>>>>>>
>>>>>>
>>>>>>
------------------------------------------------------------------------
>>>>>> *From:* gluster-users-bounces at gluster.org
>>>>>> <gluster-users-bounces at gluster.org> on behalf
of Jonathan Heese
>>>>>> <jheese at inetu.net>
>>>>>> *Sent:* Friday, March 20, 2015 7:04 AM
>>>>>> *To:* Mohammed Rafi K C
>>>>>> *Cc:* gluster-users
>>>>>> *Subject:* Re: [Gluster-users] I/O error on replicated
volume
>>>>>>  
>>>>>> Mohammed,
>>>>>>
>>>>>> Thanks very much for the reply.  I will try that and
report back.
>>>>>>
>>>>>> Regards,
>>>>>> Jon Heese
>>>>>>
>>>>>> On Mar 20, 2015, at 3:26 AM, "Mohammed Rafi K
C"
>>>>>> <rkavunga at redhat.com <mailto:rkavunga at
redhat.com>> wrote:
>>>>>>
>>>>>>>
>>>>>>> On 03/19/2015 10:16 PM, Jonathan Heese wrote:
>>>>>>>>
>>>>>>>> Hello all,
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> Does anyone else have any further suggestions
for
>>>>>>>> troubleshooting this?
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> To sum up: I have a 2 node 2 brick replicated
volume, which
>>>>>>>> holds a handful of iSCSI image files which are
mounted and
>>>>>>>> served up by tgtd (CentOS 6) to a handful of
devices on a
>>>>>>>> dedicated iSCSI network.  The most important
iSCSI clients
>>>>>>>> (initiators) are four VMware ESXi 5.5 hosts
that use the iSCSI
>>>>>>>> volumes as backing for their datastores for
virtual machine
>>>>>>>> storage.
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> After a few minutes of sustained writing to the
volume, I am
>>>>>>>> seeing a massive flood (over 1500 per second at
times) of this
>>>>>>>> error in
/var/log/glusterfs/mnt-gluster-disk.log:
>>>>>>>>
>>>>>>>> [2015-03-16 02:24:07.582801] W
>>>>>>>> [fuse-bridge.c:2242:fuse_writev_cbk]
0-glusterfs-fuse: 635358:
>>>>>>>> WRITE => -1 (Input/output error)
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> When this happens, the ESXi box fails its write
operation and
>>>>>>>> returns an error to the effect of ?Unable to
write data to
>>>>>>>> datastore?.  I don?t see anything else in the
supporting logs
>>>>>>>> to explain the root cause of the i/o errors.
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> Any and all suggestions are appreciated. 
Thanks.
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>
>>>>>>> From the mount logs, i assume that your volume
transport type is
>>>>>>> rdma. There are some known issues for rdma in
3.5.3, and the
>>>>>>> patch for to address those issues are already send
to upstream
>>>>>>> [1]. From the logs, I'm not sure and it is hard
to tell you
>>>>>>> whether this problem is something related to rdma
transport or
>>>>>>> not. To make sure that the tcp transport is works
well in this
>>>>>>> scenario, if possible can you try to reproduce the
same using
>>>>>>> tcp type volumes. You can change the transport type
of volume by
>>>>>>> doing the following step ( not recommended in
normal use case).
>>>>>>>
>>>>>>> 1) unmount every client
>>>>>>> 2) stop the volume
>>>>>>> 3) run gluster volume set volname config.transport
tcp
>>>>>>> 4) start the volume again
>>>>>>> 5) mount the clients
>>>>>>>
>>>>>>> [1] : http://goo.gl/2PTL61
>>>>>>>
>>>>>>> Regards
>>>>>>> Rafi KC
>>>>>>>
>>>>>>>> /Jon Heese/
>>>>>>>> /Systems Engineer/
>>>>>>>> *INetU Managed Hosting*
>>>>>>>> P: 610.266.7441 x 261
>>>>>>>> F: 610.266.7434
>>>>>>>> www.inetu.net <https://www.inetu.net/>
>>>>>>>>
>>>>>>>> /** This message contains confidential
information, which also
>>>>>>>> may be privileged, and is intended only for the
person(s)
>>>>>>>> addressed above. Any unauthorized use,
distribution, copying or
>>>>>>>> disclosure of confidential and/or privileged
information is
>>>>>>>> strictly prohibited. If you have received this
communication in
>>>>>>>> error, please erase all copies of the message
and its
>>>>>>>> attachments and notify the sender immediately
via reply e-mail. **/
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> *From:*Jonathan Heese
>>>>>>>> *Sent:* Tuesday, March 17, 2015 12:36 PM
>>>>>>>> *To:* 'Ravishankar N'; gluster-users at
gluster.org
>>>>>>>> *Subject:* RE: [Gluster-users] I/O error on
replicated volume
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> Ravi,
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> The last lines in the mount log before the
massive vomit of I/O
>>>>>>>> errors are from 22 minutes prior, and seem
innocuous to me:
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:07.126340] E
>>>>>>>>
[client-handshake.c:1760:client_query_portmap_cbk]
>>>>>>>> 0-gluster_disk-client-0: failed to get the port
number for
>>>>>>>> remote subvolume. Please run 'gluster
volume status' on server
>>>>>>>> to see if brick process is running.
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:07.126587] W
[rdma.c:4273:gf_rdma_disconnect]
>>>>>>>>
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>>>>> [0x7fd9c557bccf]
>>>>>>>>
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>>>>> [0x7fd9c557a995]
>>>>>>>>
(-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0:
disconnect called
>>>>>>>> (peer:10.10.10.1:24008)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:07.126687] E
>>>>>>>>
[client-handshake.c:1760:client_query_portmap_cbk]
>>>>>>>> 0-gluster_disk-client-1: failed to get the port
number for
>>>>>>>> remote subvolume. Please run 'gluster
volume status' on server
>>>>>>>> to see if brick process is running.
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:07.126737] W
[rdma.c:4273:gf_rdma_disconnect]
>>>>>>>>
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>>>>> [0x7fd9c557bccf]
>>>>>>>>
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>>>>> [0x7fd9c557a995]
>>>>>>>>
(-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1:
disconnect called
>>>>>>>> (peer:10.10.10.2:24008)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.730165] I
>>>>>>>> [rpc-clnt.c:1729:rpc_clnt_reconfig]
0-gluster_disk-client-0:
>>>>>>>> changing port to 49152 (from 0)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.730276] W
[rdma.c:4273:gf_rdma_disconnect]
>>>>>>>>
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>>>>> [0x7fd9c557bccf]
>>>>>>>>
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>>>>> [0x7fd9c557a995]
>>>>>>>>
(-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0:
disconnect called
>>>>>>>> (peer:10.10.10.1:24008)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.739500] I
>>>>>>>> [rpc-clnt.c:1729:rpc_clnt_reconfig]
0-gluster_disk-client-1:
>>>>>>>> changing port to 49152 (from 0)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.739560] W
[rdma.c:4273:gf_rdma_disconnect]
>>>>>>>>
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>>>>> [0x7fd9c557bccf]
>>>>>>>>
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>>>>> [0x7fd9c557a995]
>>>>>>>>
(-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1:
disconnect called
>>>>>>>> (peer:10.10.10.2:24008)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.741883] I
>>>>>>>>
[client-handshake.c:1677:select_server_supported_programs]
>>>>>>>> 0-gluster_disk-client-0: Using Program
GlusterFS 3.3, Num
>>>>>>>> (1298437), Version (330)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.744524] I
>>>>>>>> [client-handshake.c:1462:client_setvolume_cbk]
>>>>>>>> 0-gluster_disk-client-0: Connected to
10.10.10.1:49152,
>>>>>>>> attached to remote volume
'/bricks/brick1'.
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.744537] I
>>>>>>>> [client-handshake.c:1474:client_setvolume_cbk]
>>>>>>>> 0-gluster_disk-client-0: Server and Client
lk-version numbers
>>>>>>>> are not same, reopening the fds
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.744566] I
[afr-common.c:4267:afr_notify]
>>>>>>>> 0-gluster_disk-replicate-0: Subvolume
'gluster_disk-client-0'
>>>>>>>> came back up; going online.
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.744627] I
>>>>>>>>
[client-handshake.c:450:client_set_lk_version_cbk]
>>>>>>>> 0-gluster_disk-client-0: Server lk version = 1
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.753037] I
>>>>>>>>
[client-handshake.c:1677:select_server_supported_programs]
>>>>>>>> 0-gluster_disk-client-1: Using Program
GlusterFS 3.3, Num
>>>>>>>> (1298437), Version (330)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.755657] I
>>>>>>>> [client-handshake.c:1462:client_setvolume_cbk]
>>>>>>>> 0-gluster_disk-client-1: Connected to
10.10.10.2:49152,
>>>>>>>> attached to remote volume
'/bricks/brick1'.
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.755676] I
>>>>>>>> [client-handshake.c:1474:client_setvolume_cbk]
>>>>>>>> 0-gluster_disk-client-1: Server and Client
lk-version numbers
>>>>>>>> are not same, reopening the fds
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.761945] I
>>>>>>>> [fuse-bridge.c:5016:fuse_graph_setup] 0-fuse:
switched to graph 0
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.762144] I
>>>>>>>>
[client-handshake.c:450:client_set_lk_version_cbk]
>>>>>>>> 0-gluster_disk-client-1: Server lk version = 1
>>>>>>>>
>>>>>>>> [*2015-03-16 01:37:10.762279*] I
[fuse-bridge.c:3953:fuse_init]
>>>>>>>> 0-glusterfs-fuse: FUSE inited with protocol
versions: glusterfs
>>>>>>>> 7.22 kernel 7.14
>>>>>>>>
>>>>>>>> [*2015-03-16 01:59:26.098670*] W
>>>>>>>> [fuse-bridge.c:2242:fuse_writev_cbk]
0-glusterfs-fuse: 292084:
>>>>>>>> WRITE => -1 (Input/output error)
>>>>>>>>
>>>>>>>> ?
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> I?ve seen no indication of split-brain on any
files at any
>>>>>>>> point in this (ever since downdating from 3.6.2
to 3.5.3, which
>>>>>>>> is when this particular issue started):
>>>>>>>>
>>>>>>>> [root at duke
gfapi-module-for-linux-target-driver-]# gluster v
>>>>>>>> heal gluster_disk info
>>>>>>>>
>>>>>>>> Brick duke.jonheese.local:/bricks/brick1/
>>>>>>>>
>>>>>>>> Number of entries: 0
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> Brick duchess.jonheese.local:/bricks/brick1/
>>>>>>>>
>>>>>>>> Number of entries: 0
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> /Jon Heese/
>>>>>>>> /Systems Engineer/
>>>>>>>> *INetU Managed Hosting*
>>>>>>>> P: 610.266.7441 x 261
>>>>>>>> F: 610.266.7434
>>>>>>>> www.inetu.net <https://www.inetu.net/>
>>>>>>>>
>>>>>>>> /** This message contains confidential
information, which also
>>>>>>>> may be privileged, and is intended only for the
person(s)
>>>>>>>> addressed above. Any unauthorized use,
distribution, copying or
>>>>>>>> disclosure of confidential and/or privileged
information is
>>>>>>>> strictly prohibited. If you have received this
communication in
>>>>>>>> error, please erase all copies of the message
and its
>>>>>>>> attachments and notify the sender immediately
via reply e-mail. **/
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> *From:*Ravishankar N [mailto:ravishankar at
redhat.com]
>>>>>>>> *Sent:* Tuesday, March 17, 2015 12:35 AM
>>>>>>>> *To:* Jonathan Heese; gluster-users at
gluster.org
>>>>>>>> <mailto:gluster-users at gluster.org>
>>>>>>>> *Subject:* Re: [Gluster-users] I/O error on
replicated volume
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> On 03/17/2015 02:14 AM, Jonathan Heese wrote:
>>>>>>>>
>>>>>>>>     Hello,
>>>>>>>>
>>>>>>>>     So I resolved my previous issue with
split-brains and the
>>>>>>>>     lack of self-healing by dropping my
installed glusterfs*
>>>>>>>>     packages from 3.6.2 to 3.5.3, but now
I've picked up a new
>>>>>>>>     issue, which actually makes normal use of
the volume
>>>>>>>>     practically impossible.
>>>>>>>>
>>>>>>>>     A little background for those not already
paying close
>>>>>>>>     attention:
>>>>>>>>     I have a 2 node 2 brick replicating volume
whose purpose in
>>>>>>>>     life is to hold iSCSI target files,
primarily for use to
>>>>>>>>     provide datastores to a VMware ESXi
cluster.  The plan is
>>>>>>>>     to put a handful of image files on the
Gluster volume,
>>>>>>>>     mount them locally on both Gluster nodes,
and run tgtd on
>>>>>>>>     both, pointed to the image files on the
mounted gluster
>>>>>>>>     volume. Then the ESXi boxes will use
multipath
>>>>>>>>     (active/passive) iSCSI to connect to the
nodes, with
>>>>>>>>     automatic failover in case of planned or
unplanned downtime
>>>>>>>>     of the Gluster nodes.
>>>>>>>>
>>>>>>>>     In my most recent round of testing with
3.5.3, I'm seeing a
>>>>>>>>     massive failure to write data to the volume
after about
>>>>>>>>     5-10 minutes, so I've simplified the
scenario a bit (to
>>>>>>>>     minimize the variables) to: both Gluster
nodes up, only one
>>>>>>>>     node (duke) mounted and running tgtd, and
just regular
>>>>>>>>     (single path) iSCSI from a single ESXi
server.
>>>>>>>>
>>>>>>>>     About 5-10 minutes into migration a VM onto
the test
>>>>>>>>     datastore, /var/log/messages on duke gets
blasted with a
>>>>>>>>     ton of messages exactly like this:
>>>>>>>>
>>>>>>>>     Mar 15 22:24:06 duke tgtd:
bs_rdwr_request(180) io error
>>>>>>>>     0x1781e00 2a -1 512 22971904, Input/output
error
>>>>>>>>
>>>>>>>>      
>>>>>>>>
>>>>>>>>     And /var/log/glusterfs/mnt-gluster_disk.log
gets blased
>>>>>>>>     with a ton of messages exactly like this:
>>>>>>>>
>>>>>>>>     [2015-03-16 02:24:07.572279] W
>>>>>>>>     [fuse-bridge.c:2242:fuse_writev_cbk]
0-glusterfs-fuse:
>>>>>>>>     635299: WRITE => -1 (Input/output error)
>>>>>>>>
>>>>>>>>      
>>>>>>>>
>>>>>>>>
>>>>>>>> Are there any messages in the mount log from
AFR about
>>>>>>>> split-brain just before the above line appears?
>>>>>>>> Does `gluster v heal <VOLNAME> info` show
any files? Performing
>>>>>>>> I/O on files that are in split-brain fail with
EIO.
>>>>>>>>
>>>>>>>> -Ravi
>>>>>>>>
>>>>>>>>     And the write operation from VMware's
side fails as soon as
>>>>>>>>     these messages start.
>>>>>>>>
>>>>>>>>      
>>>>>>>>
>>>>>>>>     I don't see any other errors (in the
log files I know of)
>>>>>>>>     indicating the root cause of these i/o
errors.  I'm sure
>>>>>>>>     that this is not enough information to tell
what's going
>>>>>>>>     on, but can anyone help me figure out what
to look at next
>>>>>>>>     to figure this out?
>>>>>>>>
>>>>>>>>      
>>>>>>>>
>>>>>>>>     I've also considered using Dan
Lambright's libgfapi gluster
>>>>>>>>     module for tgtd (or something similar) to
avoid going
>>>>>>>>     through FUSE, but I'm not sure whether
that would be
>>>>>>>>     irrelevant to this problem, since I'm
not 100% sure if it
>>>>>>>>     lies in FUSE or elsewhere.
>>>>>>>>
>>>>>>>>      
>>>>>>>>
>>>>>>>>     Thanks!
>>>>>>>>
>>>>>>>>      
>>>>>>>>
>>>>>>>>     /Jon Heese/
>>>>>>>>     /Systems Engineer/
>>>>>>>>     *INetU Managed Hosting*
>>>>>>>>     P: 610.266.7441 x 261
>>>>>>>>     F: 610.266.7434
>>>>>>>>     www.inetu.net
<https://www.inetu.net/>
>>>>>>>>
>>>>>>>>     /** This message contains confidential
information, which
>>>>>>>>     also may be privileged, and is intended
only for the
>>>>>>>>     person(s) addressed above. Any unauthorized
use,
>>>>>>>>     distribution, copying or disclosure of
confidential and/or
>>>>>>>>     privileged information is strictly
prohibited. If you have
>>>>>>>>     received this communication in error,
please erase all
>>>>>>>>     copies of the message and its attachments
and notify the
>>>>>>>>     sender immediately via reply e-mail. **/
>>>>>>>>
>>>>>>>>      
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>    
_______________________________________________
>>>>>>>>
>>>>>>>>     Gluster-users mailing list
>>>>>>>>
>>>>>>>>     Gluster-users at gluster.org
<mailto:Gluster-users at gluster.org>
>>>>>>>>
>>>>>>>>    
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users at gluster.org
>>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150327/c0f7aa6e/attachment.html>

Gluster users - Mar 2015 - I/O error on replicated volume

[Gluster-users] I/O error on replicated volume

[Gluster-users] I/O error on replicated volume