Mohamed,
I have completed the steps you suggested (unmount all, stop the volume, set the
config.transport to tcp, start the volume, mount, etc.), and the behavior has
indeed changed.
[root at duke ~]# gluster volume info
Volume Name: gluster_disk
Type: Replicate
Volume ID: 2307a5a8-641e-44f4-8eaf-7cc2b704aafd
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: duke-ib:/bricks/brick1
Brick2: duchess-ib:/bricks/brick1
Options Reconfigured:
config.transport: tcp
[root at duke ~]# gluster volume status
Status of volume: gluster_disk
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick duke-ib:/bricks/brick1 49152 Y 16362
Brick duchess-ib:/bricks/brick1 49152 Y 14155
NFS Server on localhost 2049 Y 16374
Self-heal Daemon on localhost N/A Y 16381
NFS Server on duchess-ib 2049 Y 14167
Self-heal Daemon on duchess-ib N/A Y 14174
Task Status of Volume gluster_disk
------------------------------------------------------------------------------
There are no active volume tasks
I am no longer seeing the I/O errors during prolonged periods of write I/O that
I was seeing when the transport was set to rdma. However, I am seeing this
message on both nodes every 3 seconds (almost exactly):
==> /var/log/glusterfs/nfs.log <=[2015-03-21 14:17:40.379719] W
[rdma.c:1076:gf_rdma_cm_event_handler] 0-gluster_disk-client-1: cma event
RDMA_CM_EVENT_REJECTED, error 8 (me:10.10.10.1:1023 peer:10.10.10.2:49152)
Is this something to worry about? Any idea why there are rdma pieces in play
when I've set my transport to tcp? The actual I/O appears to be handled
properly and I've seen no further errors in the testing I've done so
far.
Thanks.
Regards,
Jon Heese
________________________________
From: gluster-users-bounces at gluster.org <gluster-users-bounces at
gluster.org> on behalf of Jonathan Heese <jheese at inetu.net>
Sent: Friday, March 20, 2015 7:04 AM
To: Mohammed Rafi K C
Cc: gluster-users
Subject: Re: [Gluster-users] I/O error on replicated volume
Mohammed,
Thanks very much for the reply. I will try that and report back.
Regards,
Jon Heese
On Mar 20, 2015, at 3:26 AM, "Mohammed Rafi K C" <rkavunga at
redhat.com<mailto:rkavunga at redhat.com>> wrote:
On 03/19/2015 10:16 PM, Jonathan Heese wrote:
Hello all,
Does anyone else have any further suggestions for troubleshooting this?
To sum up: I have a 2 node 2 brick replicated volume, which holds a handful of
iSCSI image files which are mounted and served up by tgtd (CentOS 6) to a
handful of devices on a dedicated iSCSI network. The most important iSCSI
clients (initiators) are four VMware ESXi 5.5 hosts that use the iSCSI volumes
as backing for their datastores for virtual machine storage.
After a few minutes of sustained writing to the volume, I am seeing a massive
flood (over 1500 per second at times) of this error in
/var/log/glusterfs/mnt-gluster-disk.log:
[2015-03-16 02:24:07.582801] W [fuse-bridge.c:2242:fuse_writev_cbk]
0-glusterfs-fuse: 635358: WRITE => -1 (Input/output error)
When this happens, the ESXi box fails its write operation and returns an error
to the effect of ?Unable to write data to datastore?. I don?t see anything else
in the supporting logs to explain the root cause of the i/o errors.
Any and all suggestions are appreciated. Thanks.
>From the mount logs, i assume that your volume transport type is rdma. There
are some known issues for rdma in 3.5.3, and the patch for to address those
issues are already send to upstream [1]. From the logs, I'm not sure and it
is hard to tell you whether this problem is something related to rdma transport
or not. To make sure that the tcp transport is works well in this scenario, if
possible can you try to reproduce the same using tcp type volumes. You can
change the transport type of volume by doing the following step ( not
recommended in normal use case).
1) unmount every client
2) stop the volume
3) run gluster volume set volname config.transport tcp
4) start the volume again
5) mount the clients
[1] : http://goo.gl/2PTL61
Regards
Rafi KC
Jon Heese
Systems Engineer
INetU Managed Hosting
P: 610.266.7441 x 261
F: 610.266.7434
www.inetu.net<https://www.inetu.net/>
** This message contains confidential information, which also may be privileged,
and is intended only for the person(s) addressed above. Any unauthorized use,
distribution, copying or disclosure of confidential and/or privileged
information is strictly prohibited. If you have received this communication in
error, please erase all copies of the message and its attachments and notify the
sender immediately via reply e-mail. **
From: Jonathan Heese
Sent: Tuesday, March 17, 2015 12:36 PM
To: 'Ravishankar N'; gluster-users at
gluster.org<mailto:gluster-users at gluster.org>
Subject: RE: [Gluster-users] I/O error on replicated volume
Ravi,
The last lines in the mount log before the massive vomit of I/O errors are from
22 minutes prior, and seem innocuous to me:
[2015-03-16 01:37:07.126340] E
[client-handshake.c:1760:client_query_portmap_cbk] 0-gluster_disk-client-0:
failed to get the port number for remote subvolume. Please run 'gluster
volume status' on server to see if brick process is running.
[2015-03-16 01:37:07.126587] W [rdma.c:4273:gf_rdma_disconnect]
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f) [0x7fd9c557bccf]
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7fd9c557a995]
(-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
[0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0: disconnect called
(peer:10.10.10.1:24008)
[2015-03-16 01:37:07.126687] E
[client-handshake.c:1760:client_query_portmap_cbk] 0-gluster_disk-client-1:
failed to get the port number for remote subvolume. Please run 'gluster
volume status' on server to see if brick process is running.
[2015-03-16 01:37:07.126737] W [rdma.c:4273:gf_rdma_disconnect]
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f) [0x7fd9c557bccf]
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7fd9c557a995]
(-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
[0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1: disconnect called
(peer:10.10.10.2:24008)
[2015-03-16 01:37:10.730165] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
0-gluster_disk-client-0: changing port to 49152 (from 0)
[2015-03-16 01:37:10.730276] W [rdma.c:4273:gf_rdma_disconnect]
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f) [0x7fd9c557bccf]
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7fd9c557a995]
(-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
[0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0: disconnect called
(peer:10.10.10.1:24008)
[2015-03-16 01:37:10.739500] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
0-gluster_disk-client-1: changing port to 49152 (from 0)
[2015-03-16 01:37:10.739560] W [rdma.c:4273:gf_rdma_disconnect]
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f) [0x7fd9c557bccf]
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7fd9c557a995]
(-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
[0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1: disconnect called
(peer:10.10.10.2:24008)
[2015-03-16 01:37:10.741883] I
[client-handshake.c:1677:select_server_supported_programs]
0-gluster_disk-client-0: Using Program GlusterFS 3.3, Num (1298437), Version
(330)
[2015-03-16 01:37:10.744524] I [client-handshake.c:1462:client_setvolume_cbk]
0-gluster_disk-client-0: Connected to 10.10.10.1:49152, attached to remote
volume '/bricks/brick1'.
[2015-03-16 01:37:10.744537] I [client-handshake.c:1474:client_setvolume_cbk]
0-gluster_disk-client-0: Server and Client lk-version numbers are not same,
reopening the fds
[2015-03-16 01:37:10.744566] I [afr-common.c:4267:afr_notify]
0-gluster_disk-replicate-0: Subvolume 'gluster_disk-client-0' came back
up; going online.
[2015-03-16 01:37:10.744627] I
[client-handshake.c:450:client_set_lk_version_cbk] 0-gluster_disk-client-0:
Server lk version = 1
[2015-03-16 01:37:10.753037] I
[client-handshake.c:1677:select_server_supported_programs]
0-gluster_disk-client-1: Using Program GlusterFS 3.3, Num (1298437), Version
(330)
[2015-03-16 01:37:10.755657] I [client-handshake.c:1462:client_setvolume_cbk]
0-gluster_disk-client-1: Connected to 10.10.10.2:49152, attached to remote
volume '/bricks/brick1'.
[2015-03-16 01:37:10.755676] I [client-handshake.c:1474:client_setvolume_cbk]
0-gluster_disk-client-1: Server and Client lk-version numbers are not same,
reopening the fds
[2015-03-16 01:37:10.761945] I [fuse-bridge.c:5016:fuse_graph_setup] 0-fuse:
switched to graph 0
[2015-03-16 01:37:10.762144] I
[client-handshake.c:450:client_set_lk_version_cbk] 0-gluster_disk-client-1:
Server lk version = 1
[2015-03-16 01:37:10.762279] I [fuse-bridge.c:3953:fuse_init] 0-glusterfs-fuse:
FUSE inited with protocol versions: glusterfs 7.22 kernel 7.14
[2015-03-16 01:59:26.098670] W [fuse-bridge.c:2242:fuse_writev_cbk]
0-glusterfs-fuse: 292084: WRITE => -1 (Input/output error)
?
I?ve seen no indication of split-brain on any files at any point in this (ever
since downdating from 3.6.2 to 3.5.3, which is when this particular issue
started):
[root at duke gfapi-module-for-linux-target-driver-]# gluster v heal
gluster_disk info
Brick duke.jonheese.local:/bricks/brick1/
Number of entries: 0
Brick duchess.jonheese.local:/bricks/brick1/
Number of entries: 0
Thanks.
Jon Heese
Systems Engineer
INetU Managed Hosting
P: 610.266.7441 x 261
F: 610.266.7434
www.inetu.net<https://www.inetu.net/>
** This message contains confidential information, which also may be privileged,
and is intended only for the person(s) addressed above. Any unauthorized use,
distribution, copying or disclosure of confidential and/or privileged
information is strictly prohibited. If you have received this communication in
error, please erase all copies of the message and its attachments and notify the
sender immediately via reply e-mail. **
From: Ravishankar N [mailto:ravishankar at redhat.com]
Sent: Tuesday, March 17, 2015 12:35 AM
To: Jonathan Heese; gluster-users at gluster.org<mailto:gluster-users at
gluster.org>
Subject: Re: [Gluster-users] I/O error on replicated volume
On 03/17/2015 02:14 AM, Jonathan Heese wrote:
Hello,
So I resolved my previous issue with split-brains and the lack of self-healing
by dropping my installed glusterfs* packages from 3.6.2 to 3.5.3, but now
I've picked up a new issue, which actually makes normal use of the volume
practically impossible.
A little background for those not already paying close attention:
I have a 2 node 2 brick replicating volume whose purpose in life is to hold
iSCSI target files, primarily for use to provide datastores to a VMware ESXi
cluster. The plan is to put a handful of image files on the Gluster volume,
mount them locally on both Gluster nodes, and run tgtd on both, pointed to the
image files on the mounted gluster volume. Then the ESXi boxes will use
multipath (active/passive) iSCSI to connect to the nodes, with automatic
failover in case of planned or unplanned downtime of the Gluster nodes.
In my most recent round of testing with 3.5.3, I'm seeing a massive failure
to write data to the volume after about 5-10 minutes, so I've simplified the
scenario a bit (to minimize the variables) to: both Gluster nodes up, only one
node (duke) mounted and running tgtd, and just regular (single path) iSCSI from
a single ESXi server.
About 5-10 minutes into migration a VM onto the test datastore,
/var/log/messages on duke gets blasted with a ton of messages exactly like this:
Mar 15 22:24:06 duke tgtd: bs_rdwr_request(180) io error 0x1781e00 2a -1 512
22971904, Input/output error
And /var/log/glusterfs/mnt-gluster_disk.log gets blased with a ton of messages
exactly like this:
[2015-03-16 02:24:07.572279] W [fuse-bridge.c:2242:fuse_writev_cbk]
0-glusterfs-fuse: 635299: WRITE => -1 (Input/output error)
Are there any messages in the mount log from AFR about split-brain just before
the above line appears?
Does `gluster v heal <VOLNAME> info` show any files? Performing I/O on
files that are in split-brain fail with EIO.
-Ravi
And the write operation from VMware's side fails as soon as these messages
start.
I don't see any other errors (in the log files I know of) indicating the
root cause of these i/o errors. I'm sure that this is not enough
information to tell what's going on, but can anyone help me figure out what
to look at next to figure this out?
I've also considered using Dan Lambright's libgfapi gluster module for
tgtd (or something similar) to avoid going through FUSE, but I'm not sure
whether that would be irrelevant to this problem, since I'm not 100% sure if
it lies in FUSE or elsewhere.
Thanks!
Jon Heese
Systems Engineer
INetU Managed Hosting
P: 610.266.7441 x 261
F: 610.266.7434
www.inetu.net<https://www.inetu.net/>
** This message contains confidential information, which also may be privileged,
and is intended only for the person(s) addressed above. Any unauthorized use,
distribution, copying or disclosure of confidential and/or privileged
information is strictly prohibited. If you have received this communication in
error, please erase all copies of the message and its attachments and notify the
sender immediately via reply e-mail. **
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>
http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150321/7d6df07d/attachment.html>