On 03/21/2015 07:49 PM, Jonathan Heese wrote:>
> Mohamed,
>
>
> I have completed the steps you suggested (unmount all, stop the
> volume, set the config.transport to tcp, start the volume, mount,
> etc.), and the behavior has indeed changed.
>
>
> [root at duke ~]# gluster volume info
>
> Volume Name: gluster_disk
> Type: Replicate
> Volume ID: 2307a5a8-641e-44f4-8eaf-7cc2b704aafd
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: duke-ib:/bricks/brick1
> Brick2: duchess-ib:/bricks/brick1
> Options Reconfigured:
> config.transport: tcp
>
>
> [root at duke ~]# gluster volume status
> Status of volume: gluster_disk
> Gluster process Port
> Online Pid
>
------------------------------------------------------------------------------
> Brick duke-ib:/bricks/brick1 49152
> Y 16362
> Brick duchess-ib:/bricks/brick1 49152
> Y 14155
> NFS Server on localhost 2049
> Y 16374
> Self-heal Daemon on localhost N/A
> Y 16381
> NFS Server on duchess-ib 2049
> Y 14167
> Self-heal Daemon on duchess-ib N/A
> Y 14174
>
> Task Status of Volume gluster_disk
>
------------------------------------------------------------------------------
> There are no active volume tasks
>
> I am no longer seeing the I/O errors during prolonged periods of write
> I/O that I was seeing when the transport was set to rdma. However, I
> am seeing this message on both nodes every 3 seconds (almost exactly):
>
>
> ==> /var/log/glusterfs/nfs.log <=> [2015-03-21 14:17:40.379719] W
[rdma.c:1076:gf_rdma_cm_event_handler]
> 0-gluster_disk-client-1: cma event RDMA_CM_EVENT_REJECTED, error 8
> (me:10.10.10.1:1023 peer:10.10.10.2:49152)
>
>
> Is this something to worry about?
>
If you are not using nfs to export the volumes, there is nothing to
worry.>
> Any idea why there are rdma pieces in play when I've set my transport
> to tcp?
>
there should not be any piece of rdma,if possible, can you paste the
volfile for nfs server. You can find the volfile in
/var/lib/glusterd/nfs/nfs-server.vol or
/usr/local/var/lib/glusterd/nfs/nfs-server.vol.
Rafi KC>
> The actual I/O appears to be handled properly and I've seen no further
> errors in the testing I've done so far.
>
>
> Thanks.
>
>
> Regards,
>
> Jon Heese
>
>
> ------------------------------------------------------------------------
> *From:* gluster-users-bounces at gluster.org
> <gluster-users-bounces at gluster.org> on behalf of Jonathan Heese
> <jheese at inetu.net>
> *Sent:* Friday, March 20, 2015 7:04 AM
> *To:* Mohammed Rafi K C
> *Cc:* gluster-users
> *Subject:* Re: [Gluster-users] I/O error on replicated volume
>
> Mohammed,
>
> Thanks very much for the reply. I will try that and report back.
>
> Regards,
> Jon Heese
>
> On Mar 20, 2015, at 3:26 AM, "Mohammed Rafi K C" <rkavunga at
redhat.com
> <mailto:rkavunga at redhat.com>> wrote:
>
>>
>> On 03/19/2015 10:16 PM, Jonathan Heese wrote:
>>>
>>> Hello all,
>>>
>>>
>>>
>>> Does anyone else have any further suggestions for troubleshooting
this?
>>>
>>>
>>>
>>> To sum up: I have a 2 node 2 brick replicated volume, which holds a
>>> handful of iSCSI image files which are mounted and served up by
tgtd
>>> (CentOS 6) to a handful of devices on a dedicated iSCSI network.
>>> The most important iSCSI clients (initiators) are four VMware ESXi
>>> 5.5 hosts that use the iSCSI volumes as backing for their
datastores
>>> for virtual machine storage.
>>>
>>>
>>>
>>> After a few minutes of sustained writing to the volume, I am seeing
>>> a massive flood (over 1500 per second at times) of this error in
>>> /var/log/glusterfs/mnt-gluster-disk.log:
>>>
>>> [2015-03-16 02:24:07.582801] W [fuse-bridge.c:2242:fuse_writev_cbk]
>>> 0-glusterfs-fuse: 635358: WRITE => -1 (Input/output error)
>>>
>>>
>>>
>>> When this happens, the ESXi box fails its write operation and
>>> returns an error to the effect of ?Unable to write data to
>>> datastore?. I don?t see anything else in the supporting logs to
>>> explain the root cause of the i/o errors.
>>>
>>>
>>>
>>> Any and all suggestions are appreciated. Thanks.
>>>
>>>
>>>
>>
>> From the mount logs, i assume that your volume transport type is
>> rdma. There are some known issues for rdma in 3.5.3, and the patch
>> for to address those issues are already send to upstream [1]. From
>> the logs, I'm not sure and it is hard to tell you whether this
>> problem is something related to rdma transport or not. To make sure
>> that the tcp transport is works well in this scenario, if possible
>> can you try to reproduce the same using tcp type volumes. You can
>> change the transport type of volume by doing the following step ( not
>> recommended in normal use case).
>>
>> 1) unmount every client
>> 2) stop the volume
>> 3) run gluster volume set volname config.transport tcp
>> 4) start the volume again
>> 5) mount the clients
>>
>> [1] : http://goo.gl/2PTL61
>>
>> Regards
>> Rafi KC
>>
>>> /Jon Heese/
>>> /Systems Engineer/
>>> *INetU Managed Hosting*
>>> P: 610.266.7441 x 261
>>> F: 610.266.7434
>>> www.inetu.net <https://www.inetu.net/>
>>>
>>> /** This message contains confidential information, which also may
>>> be privileged, and is intended only for the person(s) addressed
>>> above. Any unauthorized use, distribution, copying or disclosure of
>>> confidential and/or privileged information is strictly prohibited.
>>> If you have received this communication in error, please erase all
>>> copies of the message and its attachments and notify the sender
>>> immediately via reply e-mail. **/
>>>
>>>
>>>
>>> *From:*Jonathan Heese
>>> *Sent:* Tuesday, March 17, 2015 12:36 PM
>>> *To:* 'Ravishankar N'; gluster-users at gluster.org
>>> *Subject:* RE: [Gluster-users] I/O error on replicated volume
>>>
>>>
>>>
>>> Ravi,
>>>
>>>
>>>
>>> The last lines in the mount log before the massive vomit of I/O
>>> errors are from 22 minutes prior, and seem innocuous to me:
>>>
>>>
>>>
>>> [2015-03-16 01:37:07.126340] E
>>> [client-handshake.c:1760:client_query_portmap_cbk]
>>> 0-gluster_disk-client-0: failed to get the port number for remote
>>> subvolume. Please run 'gluster volume status' on server to
see if
>>> brick process is running.
>>>
>>> [2015-03-16 01:37:07.126587] W [rdma.c:4273:gf_rdma_disconnect]
>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
[0x7fd9c557bccf]
>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>> [0x7fd9c557a995]
>>>
(-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0: disconnect called
>>> (peer:10.10.10.1:24008)
>>>
>>> [2015-03-16 01:37:07.126687] E
>>> [client-handshake.c:1760:client_query_portmap_cbk]
>>> 0-gluster_disk-client-1: failed to get the port number for remote
>>> subvolume. Please run 'gluster volume status' on server to
see if
>>> brick process is running.
>>>
>>> [2015-03-16 01:37:07.126737] W [rdma.c:4273:gf_rdma_disconnect]
>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
[0x7fd9c557bccf]
>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>> [0x7fd9c557a995]
>>>
(-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1: disconnect called
>>> (peer:10.10.10.2:24008)
>>>
>>> [2015-03-16 01:37:10.730165] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
>>> 0-gluster_disk-client-0: changing port to 49152 (from 0)
>>>
>>> [2015-03-16 01:37:10.730276] W [rdma.c:4273:gf_rdma_disconnect]
>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
[0x7fd9c557bccf]
>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>> [0x7fd9c557a995]
>>>
(-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0: disconnect called
>>> (peer:10.10.10.1:24008)
>>>
>>> [2015-03-16 01:37:10.739500] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
>>> 0-gluster_disk-client-1: changing port to 49152 (from 0)
>>>
>>> [2015-03-16 01:37:10.739560] W [rdma.c:4273:gf_rdma_disconnect]
>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
[0x7fd9c557bccf]
>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>> [0x7fd9c557a995]
>>>
(-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1: disconnect called
>>> (peer:10.10.10.2:24008)
>>>
>>> [2015-03-16 01:37:10.741883] I
>>> [client-handshake.c:1677:select_server_supported_programs]
>>> 0-gluster_disk-client-0: Using Program GlusterFS 3.3, Num
(1298437),
>>> Version (330)
>>>
>>> [2015-03-16 01:37:10.744524] I
>>> [client-handshake.c:1462:client_setvolume_cbk]
>>> 0-gluster_disk-client-0: Connected to 10.10.10.1:49152, attached to
>>> remote volume '/bricks/brick1'.
>>>
>>> [2015-03-16 01:37:10.744537] I
>>> [client-handshake.c:1474:client_setvolume_cbk]
>>> 0-gluster_disk-client-0: Server and Client lk-version numbers are
>>> not same, reopening the fds
>>>
>>> [2015-03-16 01:37:10.744566] I [afr-common.c:4267:afr_notify]
>>> 0-gluster_disk-replicate-0: Subvolume
'gluster_disk-client-0' came
>>> back up; going online.
>>>
>>> [2015-03-16 01:37:10.744627] I
>>> [client-handshake.c:450:client_set_lk_version_cbk]
>>> 0-gluster_disk-client-0: Server lk version = 1
>>>
>>> [2015-03-16 01:37:10.753037] I
>>> [client-handshake.c:1677:select_server_supported_programs]
>>> 0-gluster_disk-client-1: Using Program GlusterFS 3.3, Num
(1298437),
>>> Version (330)
>>>
>>> [2015-03-16 01:37:10.755657] I
>>> [client-handshake.c:1462:client_setvolume_cbk]
>>> 0-gluster_disk-client-1: Connected to 10.10.10.2:49152, attached to
>>> remote volume '/bricks/brick1'.
>>>
>>> [2015-03-16 01:37:10.755676] I
>>> [client-handshake.c:1474:client_setvolume_cbk]
>>> 0-gluster_disk-client-1: Server and Client lk-version numbers are
>>> not same, reopening the fds
>>>
>>> [2015-03-16 01:37:10.761945] I
[fuse-bridge.c:5016:fuse_graph_setup]
>>> 0-fuse: switched to graph 0
>>>
>>> [2015-03-16 01:37:10.762144] I
>>> [client-handshake.c:450:client_set_lk_version_cbk]
>>> 0-gluster_disk-client-1: Server lk version = 1
>>>
>>> [*2015-03-16 01:37:10.762279*] I [fuse-bridge.c:3953:fuse_init]
>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs
7.22
>>> kernel 7.14
>>>
>>> [*2015-03-16 01:59:26.098670*] W
>>> [fuse-bridge.c:2242:fuse_writev_cbk] 0-glusterfs-fuse: 292084:
WRITE
>>> => -1 (Input/output error)
>>>
>>> ?
>>>
>>>
>>>
>>> I?ve seen no indication of split-brain on any files at any point in
>>> this (ever since downdating from 3.6.2 to 3.5.3, which is when this
>>> particular issue started):
>>>
>>> [root at duke gfapi-module-for-linux-target-driver-]# gluster v
heal
>>> gluster_disk info
>>>
>>> Brick duke.jonheese.local:/bricks/brick1/
>>>
>>> Number of entries: 0
>>>
>>>
>>>
>>> Brick duchess.jonheese.local:/bricks/brick1/
>>>
>>> Number of entries: 0
>>>
>>>
>>>
>>> Thanks.
>>>
>>>
>>>
>>> /Jon Heese/
>>> /Systems Engineer/
>>> *INetU Managed Hosting*
>>> P: 610.266.7441 x 261
>>> F: 610.266.7434
>>> www.inetu.net <https://www.inetu.net/>
>>>
>>> /** This message contains confidential information, which also may
>>> be privileged, and is intended only for the person(s) addressed
>>> above. Any unauthorized use, distribution, copying or disclosure of
>>> confidential and/or privileged information is strictly prohibited.
>>> If you have received this communication in error, please erase all
>>> copies of the message and its attachments and notify the sender
>>> immediately via reply e-mail. **/
>>>
>>>
>>>
>>> *From:*Ravishankar N [mailto:ravishankar at redhat.com]
>>> *Sent:* Tuesday, March 17, 2015 12:35 AM
>>> *To:* Jonathan Heese; gluster-users at gluster.org
>>> <mailto:gluster-users at gluster.org>
>>> *Subject:* Re: [Gluster-users] I/O error on replicated volume
>>>
>>>
>>>
>>>
>>>
>>> On 03/17/2015 02:14 AM, Jonathan Heese wrote:
>>>
>>> Hello,
>>>
>>> So I resolved my previous issue with split-brains and the lack
>>> of self-healing by dropping my installed glusterfs* packages
>>> from 3.6.2 to 3.5.3, but now I've picked up a new issue,
which
>>> actually makes normal use of the volume practically impossible.
>>>
>>> A little background for those not already paying close
attention:
>>> I have a 2 node 2 brick replicating volume whose purpose in
life
>>> is to hold iSCSI target files, primarily for use to provide
>>> datastores to a VMware ESXi cluster. The plan is to put a
>>> handful of image files on the Gluster volume, mount them
locally
>>> on both Gluster nodes, and run tgtd on both, pointed to the
>>> image files on the mounted gluster volume. Then the ESXi boxes
>>> will use multipath (active/passive) iSCSI to connect to the
>>> nodes, with automatic failover in case of planned or unplanned
>>> downtime of the Gluster nodes.
>>>
>>> In my most recent round of testing with 3.5.3, I'm seeing a
>>> massive failure to write data to the volume after about 5-10
>>> minutes, so I've simplified the scenario a bit (to minimize
the
>>> variables) to: both Gluster nodes up, only one node (duke)
>>> mounted and running tgtd, and just regular (single path) iSCSI
>>> from a single ESXi server.
>>>
>>> About 5-10 minutes into migration a VM onto the test datastore,
>>> /var/log/messages on duke gets blasted with a ton of messages
>>> exactly like this:
>>>
>>> Mar 15 22:24:06 duke tgtd: bs_rdwr_request(180) io error
>>> 0x1781e00 2a -1 512 22971904, Input/output error
>>>
>>>
>>>
>>> And /var/log/glusterfs/mnt-gluster_disk.log gets blased with a
>>> ton of messages exactly like this:
>>>
>>> [2015-03-16 02:24:07.572279] W
>>> [fuse-bridge.c:2242:fuse_writev_cbk] 0-glusterfs-fuse: 635299:
>>> WRITE => -1 (Input/output error)
>>>
>>>
>>>
>>>
>>> Are there any messages in the mount log from AFR about split-brain
>>> just before the above line appears?
>>> Does `gluster v heal <VOLNAME> info` show any files?
Performing I/O
>>> on files that are in split-brain fail with EIO.
>>>
>>> -Ravi
>>>
>>> And the write operation from VMware's side fails as soon as
>>> these messages start.
>>>
>>>
>>>
>>> I don't see any other errors (in the log files I know of)
>>> indicating the root cause of these i/o errors. I'm sure
that
>>> this is not enough information to tell what's going on, but
can
>>> anyone help me figure out what to look at next to figure this
out?
>>>
>>>
>>>
>>> I've also considered using Dan Lambright's libgfapi
gluster
>>> module for tgtd (or something similar) to avoid going through
>>> FUSE, but I'm not sure whether that would be irrelevant to
this
>>> problem, since I'm not 100% sure if it lies in FUSE or
elsewhere.
>>>
>>>
>>>
>>> Thanks!
>>>
>>>
>>>
>>> /Jon Heese/
>>> /Systems Engineer/
>>> *INetU Managed Hosting*
>>> P: 610.266.7441 x 261
>>> F: 610.266.7434
>>> www.inetu.net <https://www.inetu.net/>
>>>
>>> /** This message contains confidential information, which also
>>> may be privileged, and is intended only for the person(s)
>>> addressed above. Any unauthorized use, distribution, copying or
>>> disclosure of confidential and/or privileged information is
>>> strictly prohibited. If you have received this communication in
>>> error, please erase all copies of the message and its
>>> attachments and notify the sender immediately via reply e-mail.
**/
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>>
>>> Gluster-users mailing list
>>>
>>> Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>>>
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150323/593b4723/attachment.html>