On 03/27/2015 11:04 AM, Jonathan Heese wrote:> On Mar 27, 2015, at 1:29 AM, "Mohammed Rafi K C" <rkavunga at
redhat.com
> <mailto:rkavunga at redhat.com>> wrote:
>
>>
>> When we change the transport from x to y, it should reflect in all
>> the vol files. But unfortunately, the volume set command failed to
>> change in nfs server, (of course it is a bug). I had clearly
>> mentioned in my previous mails, that changing the volume files using
>> the volume set command is not recommended, i suggested this, just to
>> check whether tcp work fine or not.
>>
>> The reason why you are getting rdma connection error is because , now
>> bricks are running through tcp, so the brick process will be
>> listening on socket port. But nfs-server asked for an rdma
>> connection, so they are trying to connect from rdma port to tcp port.
>> Obviously the connection will be rejected.
>
> Okay, thanks for the thorough explanation there.
>
> Now that we know that TCP does function without the original I/O
> errors (from the start of this thread), how do you suggest that I proceed?
>
> Do I have to wait for a subsequent release to rid myself of this bug?
I will make sure to fix the bug. Also we can expect that rdma patches
will be merged soon. After rdma bug's are fixed in 3.5.x , you can test
and switch to rdma.
>
> Would it be feasible for me to switch from RDMA to TCP in a more
> permanent fashion (maybe wipe the cluster and start over?)?
Either you can manually edit nfs-volfile, to change transport to "option
transport-type tcp", in all the places, then restarting nfs will solve
the problem. Or else and if possible you can start a fresh cluster
running on tcp.
Rafi>
> Thanks.
>
> Regards,
> Jon Heese
>
>> Regards
>> Rafi KC
>>
>> On 03/27/2015 12:28 AM, Jonathan Heese wrote:
>>>
>>> Rafi,
>>>
>>>
>>> Here is my nfs-server.vol file:
>>>
>>>
>>> [root at duke ~]# cat /var/lib/glusterd/nfs/nfs-server.vol
>>> volume gluster_disk-client-0
>>> type protocol/client
>>> option send-gids true
>>> option password 562ab460-7754-4b5a-82e6-18ed6c130786
>>> option username ad5d5754-cf02-4b96-9f85-ff3129ae0405
>>> option transport-type rdma
>>> option remote-subvolume /bricks/brick1
>>> option remote-host duke-ib
>>> end-volume
>>>
>>> volume gluster_disk-client-1
>>> type protocol/client
>>> option send-gids true
>>> option password 562ab460-7754-4b5a-82e6-18ed6c130786
>>> option username ad5d5754-cf02-4b96-9f85-ff3129ae0405
>>> option transport-type rdma
>>> option remote-subvolume /bricks/brick1
>>> option remote-host duchess-ib
>>> end-volume
>>>
>>> volume gluster_disk-replicate-0
>>> type cluster/replicate
>>> subvolumes gluster_disk-client-0 gluster_disk-client-1
>>> end-volume
>>>
>>> volume gluster_disk-dht
>>> type cluster/distribute
>>> subvolumes gluster_disk-replicate-0
>>> end-volume
>>>
>>> volume gluster_disk-write-behind
>>> type performance/write-behind
>>> subvolumes gluster_disk-dht
>>> end-volume
>>>
>>> volume gluster_disk
>>> type debug/io-stats
>>> option count-fop-hits off
>>> option latency-measurement off
>>> subvolumes gluster_disk-write-behind
>>> end-volume
>>>
>>> volume nfs-server
>>> type nfs/server
>>> option nfs3.gluster_disk.volume-id
>>> 2307a5a8-641e-44f4-8eaf-7cc2b704aafd
>>> option rpc-auth.addr.gluster_disk.allow *
>>> option nfs.drc off
>>> option nfs.nlm on
>>> option nfs.dynamic-volumes on
>>> subvolumes gluster_disk
>>> end-volume
>>>
>>> I see that "transport-type rdma" is listed a couple times
here, but
>>> "gluster volume info" indicates that the volume is using
the tcp
>>> transport:
>>>
>>>
>>> [root at duke ~]# gluster volume info gluster_disk
>>>
>>> Volume Name: gluster_disk
>>> Type: Replicate
>>> Volume ID: 2307a5a8-641e-44f4-8eaf-7cc2b704aafd
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: duke-ib:/bricks/brick1
>>> Brick2: duchess-ib:/bricks/brick1
>>> Options Reconfigured:
>>> config.transport: tcp
>>>
>>> Please let me know if you need any further information from me to
>>> determine how to correct this discrepancy.
>>>
>>>
>>> Also, I feel compelled to ask: Since the TCP connections are going
>>> over the InfiniBand connections between the two Gluster servers
>>> (based on the hostnames which are pointed to the IB IPs via hosts
>>> files), are there any (significant) drawbacks to using TCP instead
>>> of RDMA here? Thanks.
>>>
>>>
>>> Regards,
>>>
>>> Jon Heese
>>>
>>>
>>>
------------------------------------------------------------------------
>>> *From:* Mohammed Rafi K C <rkavunga at redhat.com>
>>> *Sent:* Monday, March 23, 2015 3:29 AM
>>> *To:* Jonathan Heese
>>> *Cc:* gluster-users
>>> *Subject:* Re: [Gluster-users] I/O error on replicated volume
>>>
>>>
>>> On 03/23/2015 11:28 AM, Jonathan Heese wrote:
>>>> On Mar 23, 2015, at 1:20 AM, "Mohammed Rafi K C"
>>>> <rkavunga at redhat.com <mailto:rkavunga at
redhat.com>> wrote:
>>>>
>>>>>
>>>>> On 03/21/2015 07:49 PM, Jonathan Heese wrote:
>>>>>>
>>>>>> Mohamed,
>>>>>>
>>>>>>
>>>>>> I have completed the steps you suggested (unmount all,
stop the
>>>>>> volume, set the config.transport to tcp, start the
volume, mount,
>>>>>> etc.), and the behavior has indeed changed.
>>>>>>
>>>>>>
>>>>>> [root at duke ~]# gluster volume info
>>>>>>
>>>>>> Volume Name: gluster_disk
>>>>>> Type: Replicate
>>>>>> Volume ID: 2307a5a8-641e-44f4-8eaf-7cc2b704aafd
>>>>>> Status: Started
>>>>>> Number of Bricks: 1 x 2 = 2
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: duke-ib:/bricks/brick1
>>>>>> Brick2: duchess-ib:/bricks/brick1
>>>>>> Options Reconfigured:
>>>>>> config.transport: tcp
>>>>>>
>>>>>>
>>>>>> [root at duke ~]# gluster volume status
>>>>>> Status of volume: gluster_disk
>>>>>> Gluster process
Port
>>>>>> Online Pid
>>>>>>
------------------------------------------------------------------------------
>>>>>> Brick duke-ib:/bricks/brick1
49152
>>>>>> Y 16362
>>>>>> Brick duchess-ib:/bricks/brick1
49152
>>>>>> Y 14155
>>>>>> NFS Server on localhost
2049
>>>>>> Y 16374
>>>>>> Self-heal Daemon on localhost
N/A
>>>>>> Y 16381
>>>>>> NFS Server on duchess-ib
2049
>>>>>> Y 14167
>>>>>> Self-heal Daemon on duchess-ib
N/A
>>>>>> Y 14174
>>>>>>
>>>>>> Task Status of Volume gluster_disk
>>>>>>
------------------------------------------------------------------------------
>>>>>> There are no active volume tasks
>>>>>>
>>>>>> I am no longer seeing the I/O errors during prolonged
periods of
>>>>>> write I/O that I was seeing when the transport was set
to rdma.
>>>>>> However, I am seeing this message on both nodes every 3
seconds
>>>>>> (almost exactly):
>>>>>>
>>>>>>
>>>>>> ==> /var/log/glusterfs/nfs.log
<=>>>>>> [2015-03-21 14:17:40.379719] W
>>>>>> [rdma.c:1076:gf_rdma_cm_event_handler]
0-gluster_disk-client-1:
>>>>>> cma event RDMA_CM_EVENT_REJECTED, error 8
(me:10.10.10.1:1023
>>>>>> peer:10.10.10.2:49152)
>>>>>>
>>>>>>
>>>>>> Is this something to worry about?
>>>>>>
>>>>> If you are not using nfs to export the volumes, there is
nothing
>>>>> to worry.
>>>>
>>>> I'm using the native glusterfs FUSE component to mount the
volume
>>>> locally on both servers -- I assume that you're referring
to the
>>>> standard NFS protocol stuff, which I'm not using here.
>>>>
>>>> Incidentally, I would like to keep my logs from filling up with
>>>> junk if possible. Is there something I can do to get rid of
these
>>>> (useless?) error messages?
>>>
>>> If i understand correctly, you are getting this enormous log
message
>>> from nfs log only, all other logs and everything are fine now,
right
>>> ? If that is the case, and you are not at all using nfs for
>>> exporting the volume, as a workaround you can disable nfs for your
>>> volume or cluster. (gluster v set nfs.disable on). This will
turnoff
>>> your gluster nfs server, and you will no longer get those log
messages.
>>>
>>>
>>>>>> Any idea why there are rdma pieces in play when
I've set my
>>>>>> transport to tcp?
>>>>>>
>>>>>
>>>>> there should not be any piece of rdma,if possible, can you
paste
>>>>> the volfile for nfs server. You can find the volfile in
>>>>> /var/lib/glusterd/nfs/nfs-server.vol or
>>>>> /usr/local/var/lib/glusterd/nfs/nfs-server.vol
>>>>
>>>> I will get this for you when I can. Thanks.
>>>
>>> If you can make it, that will be great help to understand the
problem.
>>>
>>>
>>> Rafi KC
>>>
>>>>
>>>> Regards,
>>>> Jon Heese
>>>>
>>>>> Rafi KC
>>>>>>
>>>>>> The actual I/O appears to be handled properly and
I've seen no
>>>>>> further errors in the testing I've done so far.
>>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Jon Heese
>>>>>>
>>>>>>
>>>>>>
------------------------------------------------------------------------
>>>>>> *From:* gluster-users-bounces at gluster.org
>>>>>> <gluster-users-bounces at gluster.org> on behalf
of Jonathan Heese
>>>>>> <jheese at inetu.net>
>>>>>> *Sent:* Friday, March 20, 2015 7:04 AM
>>>>>> *To:* Mohammed Rafi K C
>>>>>> *Cc:* gluster-users
>>>>>> *Subject:* Re: [Gluster-users] I/O error on replicated
volume
>>>>>>
>>>>>> Mohammed,
>>>>>>
>>>>>> Thanks very much for the reply. I will try that and
report back.
>>>>>>
>>>>>> Regards,
>>>>>> Jon Heese
>>>>>>
>>>>>> On Mar 20, 2015, at 3:26 AM, "Mohammed Rafi K
C"
>>>>>> <rkavunga at redhat.com <mailto:rkavunga at
redhat.com>> wrote:
>>>>>>
>>>>>>>
>>>>>>> On 03/19/2015 10:16 PM, Jonathan Heese wrote:
>>>>>>>>
>>>>>>>> Hello all,
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Does anyone else have any further suggestions
for
>>>>>>>> troubleshooting this?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> To sum up: I have a 2 node 2 brick replicated
volume, which
>>>>>>>> holds a handful of iSCSI image files which are
mounted and
>>>>>>>> served up by tgtd (CentOS 6) to a handful of
devices on a
>>>>>>>> dedicated iSCSI network. The most important
iSCSI clients
>>>>>>>> (initiators) are four VMware ESXi 5.5 hosts
that use the iSCSI
>>>>>>>> volumes as backing for their datastores for
virtual machine
>>>>>>>> storage.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> After a few minutes of sustained writing to the
volume, I am
>>>>>>>> seeing a massive flood (over 1500 per second at
times) of this
>>>>>>>> error in
/var/log/glusterfs/mnt-gluster-disk.log:
>>>>>>>>
>>>>>>>> [2015-03-16 02:24:07.582801] W
>>>>>>>> [fuse-bridge.c:2242:fuse_writev_cbk]
0-glusterfs-fuse: 635358:
>>>>>>>> WRITE => -1 (Input/output error)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> When this happens, the ESXi box fails its write
operation and
>>>>>>>> returns an error to the effect of ?Unable to
write data to
>>>>>>>> datastore?. I don?t see anything else in the
supporting logs
>>>>>>>> to explain the root cause of the i/o errors.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Any and all suggestions are appreciated.
Thanks.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> From the mount logs, i assume that your volume
transport type is
>>>>>>> rdma. There are some known issues for rdma in
3.5.3, and the
>>>>>>> patch for to address those issues are already send
to upstream
>>>>>>> [1]. From the logs, I'm not sure and it is hard
to tell you
>>>>>>> whether this problem is something related to rdma
transport or
>>>>>>> not. To make sure that the tcp transport is works
well in this
>>>>>>> scenario, if possible can you try to reproduce the
same using
>>>>>>> tcp type volumes. You can change the transport type
of volume by
>>>>>>> doing the following step ( not recommended in
normal use case).
>>>>>>>
>>>>>>> 1) unmount every client
>>>>>>> 2) stop the volume
>>>>>>> 3) run gluster volume set volname config.transport
tcp
>>>>>>> 4) start the volume again
>>>>>>> 5) mount the clients
>>>>>>>
>>>>>>> [1] : http://goo.gl/2PTL61
>>>>>>>
>>>>>>> Regards
>>>>>>> Rafi KC
>>>>>>>
>>>>>>>> /Jon Heese/
>>>>>>>> /Systems Engineer/
>>>>>>>> *INetU Managed Hosting*
>>>>>>>> P: 610.266.7441 x 261
>>>>>>>> F: 610.266.7434
>>>>>>>> www.inetu.net <https://www.inetu.net/>
>>>>>>>>
>>>>>>>> /** This message contains confidential
information, which also
>>>>>>>> may be privileged, and is intended only for the
person(s)
>>>>>>>> addressed above. Any unauthorized use,
distribution, copying or
>>>>>>>> disclosure of confidential and/or privileged
information is
>>>>>>>> strictly prohibited. If you have received this
communication in
>>>>>>>> error, please erase all copies of the message
and its
>>>>>>>> attachments and notify the sender immediately
via reply e-mail. **/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *From:*Jonathan Heese
>>>>>>>> *Sent:* Tuesday, March 17, 2015 12:36 PM
>>>>>>>> *To:* 'Ravishankar N'; gluster-users at
gluster.org
>>>>>>>> *Subject:* RE: [Gluster-users] I/O error on
replicated volume
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Ravi,
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The last lines in the mount log before the
massive vomit of I/O
>>>>>>>> errors are from 22 minutes prior, and seem
innocuous to me:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:07.126340] E
>>>>>>>>
[client-handshake.c:1760:client_query_portmap_cbk]
>>>>>>>> 0-gluster_disk-client-0: failed to get the port
number for
>>>>>>>> remote subvolume. Please run 'gluster
volume status' on server
>>>>>>>> to see if brick process is running.
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:07.126587] W
[rdma.c:4273:gf_rdma_disconnect]
>>>>>>>>
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>>>>> [0x7fd9c557bccf]
>>>>>>>>
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>>>>> [0x7fd9c557a995]
>>>>>>>>
(-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0:
disconnect called
>>>>>>>> (peer:10.10.10.1:24008)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:07.126687] E
>>>>>>>>
[client-handshake.c:1760:client_query_portmap_cbk]
>>>>>>>> 0-gluster_disk-client-1: failed to get the port
number for
>>>>>>>> remote subvolume. Please run 'gluster
volume status' on server
>>>>>>>> to see if brick process is running.
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:07.126737] W
[rdma.c:4273:gf_rdma_disconnect]
>>>>>>>>
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>>>>> [0x7fd9c557bccf]
>>>>>>>>
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>>>>> [0x7fd9c557a995]
>>>>>>>>
(-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1:
disconnect called
>>>>>>>> (peer:10.10.10.2:24008)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.730165] I
>>>>>>>> [rpc-clnt.c:1729:rpc_clnt_reconfig]
0-gluster_disk-client-0:
>>>>>>>> changing port to 49152 (from 0)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.730276] W
[rdma.c:4273:gf_rdma_disconnect]
>>>>>>>>
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>>>>> [0x7fd9c557bccf]
>>>>>>>>
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>>>>> [0x7fd9c557a995]
>>>>>>>>
(-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0:
disconnect called
>>>>>>>> (peer:10.10.10.1:24008)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.739500] I
>>>>>>>> [rpc-clnt.c:1729:rpc_clnt_reconfig]
0-gluster_disk-client-1:
>>>>>>>> changing port to 49152 (from 0)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.739560] W
[rdma.c:4273:gf_rdma_disconnect]
>>>>>>>>
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>>>>> [0x7fd9c557bccf]
>>>>>>>>
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>>>>> [0x7fd9c557a995]
>>>>>>>>
(-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1:
disconnect called
>>>>>>>> (peer:10.10.10.2:24008)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.741883] I
>>>>>>>>
[client-handshake.c:1677:select_server_supported_programs]
>>>>>>>> 0-gluster_disk-client-0: Using Program
GlusterFS 3.3, Num
>>>>>>>> (1298437), Version (330)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.744524] I
>>>>>>>> [client-handshake.c:1462:client_setvolume_cbk]
>>>>>>>> 0-gluster_disk-client-0: Connected to
10.10.10.1:49152,
>>>>>>>> attached to remote volume
'/bricks/brick1'.
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.744537] I
>>>>>>>> [client-handshake.c:1474:client_setvolume_cbk]
>>>>>>>> 0-gluster_disk-client-0: Server and Client
lk-version numbers
>>>>>>>> are not same, reopening the fds
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.744566] I
[afr-common.c:4267:afr_notify]
>>>>>>>> 0-gluster_disk-replicate-0: Subvolume
'gluster_disk-client-0'
>>>>>>>> came back up; going online.
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.744627] I
>>>>>>>>
[client-handshake.c:450:client_set_lk_version_cbk]
>>>>>>>> 0-gluster_disk-client-0: Server lk version = 1
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.753037] I
>>>>>>>>
[client-handshake.c:1677:select_server_supported_programs]
>>>>>>>> 0-gluster_disk-client-1: Using Program
GlusterFS 3.3, Num
>>>>>>>> (1298437), Version (330)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.755657] I
>>>>>>>> [client-handshake.c:1462:client_setvolume_cbk]
>>>>>>>> 0-gluster_disk-client-1: Connected to
10.10.10.2:49152,
>>>>>>>> attached to remote volume
'/bricks/brick1'.
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.755676] I
>>>>>>>> [client-handshake.c:1474:client_setvolume_cbk]
>>>>>>>> 0-gluster_disk-client-1: Server and Client
lk-version numbers
>>>>>>>> are not same, reopening the fds
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.761945] I
>>>>>>>> [fuse-bridge.c:5016:fuse_graph_setup] 0-fuse:
switched to graph 0
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.762144] I
>>>>>>>>
[client-handshake.c:450:client_set_lk_version_cbk]
>>>>>>>> 0-gluster_disk-client-1: Server lk version = 1
>>>>>>>>
>>>>>>>> [*2015-03-16 01:37:10.762279*] I
[fuse-bridge.c:3953:fuse_init]
>>>>>>>> 0-glusterfs-fuse: FUSE inited with protocol
versions: glusterfs
>>>>>>>> 7.22 kernel 7.14
>>>>>>>>
>>>>>>>> [*2015-03-16 01:59:26.098670*] W
>>>>>>>> [fuse-bridge.c:2242:fuse_writev_cbk]
0-glusterfs-fuse: 292084:
>>>>>>>> WRITE => -1 (Input/output error)
>>>>>>>>
>>>>>>>> ?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I?ve seen no indication of split-brain on any
files at any
>>>>>>>> point in this (ever since downdating from 3.6.2
to 3.5.3, which
>>>>>>>> is when this particular issue started):
>>>>>>>>
>>>>>>>> [root at duke
gfapi-module-for-linux-target-driver-]# gluster v
>>>>>>>> heal gluster_disk info
>>>>>>>>
>>>>>>>> Brick duke.jonheese.local:/bricks/brick1/
>>>>>>>>
>>>>>>>> Number of entries: 0
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Brick duchess.jonheese.local:/bricks/brick1/
>>>>>>>>
>>>>>>>> Number of entries: 0
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> /Jon Heese/
>>>>>>>> /Systems Engineer/
>>>>>>>> *INetU Managed Hosting*
>>>>>>>> P: 610.266.7441 x 261
>>>>>>>> F: 610.266.7434
>>>>>>>> www.inetu.net <https://www.inetu.net/>
>>>>>>>>
>>>>>>>> /** This message contains confidential
information, which also
>>>>>>>> may be privileged, and is intended only for the
person(s)
>>>>>>>> addressed above. Any unauthorized use,
distribution, copying or
>>>>>>>> disclosure of confidential and/or privileged
information is
>>>>>>>> strictly prohibited. If you have received this
communication in
>>>>>>>> error, please erase all copies of the message
and its
>>>>>>>> attachments and notify the sender immediately
via reply e-mail. **/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *From:*Ravishankar N [mailto:ravishankar at
redhat.com]
>>>>>>>> *Sent:* Tuesday, March 17, 2015 12:35 AM
>>>>>>>> *To:* Jonathan Heese; gluster-users at
gluster.org
>>>>>>>> <mailto:gluster-users at gluster.org>
>>>>>>>> *Subject:* Re: [Gluster-users] I/O error on
replicated volume
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 03/17/2015 02:14 AM, Jonathan Heese wrote:
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> So I resolved my previous issue with
split-brains and the
>>>>>>>> lack of self-healing by dropping my
installed glusterfs*
>>>>>>>> packages from 3.6.2 to 3.5.3, but now
I've picked up a new
>>>>>>>> issue, which actually makes normal use of
the volume
>>>>>>>> practically impossible.
>>>>>>>>
>>>>>>>> A little background for those not already
paying close
>>>>>>>> attention:
>>>>>>>> I have a 2 node 2 brick replicating volume
whose purpose in
>>>>>>>> life is to hold iSCSI target files,
primarily for use to
>>>>>>>> provide datastores to a VMware ESXi
cluster. The plan is
>>>>>>>> to put a handful of image files on the
Gluster volume,
>>>>>>>> mount them locally on both Gluster nodes,
and run tgtd on
>>>>>>>> both, pointed to the image files on the
mounted gluster
>>>>>>>> volume. Then the ESXi boxes will use
multipath
>>>>>>>> (active/passive) iSCSI to connect to the
nodes, with
>>>>>>>> automatic failover in case of planned or
unplanned downtime
>>>>>>>> of the Gluster nodes.
>>>>>>>>
>>>>>>>> In my most recent round of testing with
3.5.3, I'm seeing a
>>>>>>>> massive failure to write data to the volume
after about
>>>>>>>> 5-10 minutes, so I've simplified the
scenario a bit (to
>>>>>>>> minimize the variables) to: both Gluster
nodes up, only one
>>>>>>>> node (duke) mounted and running tgtd, and
just regular
>>>>>>>> (single path) iSCSI from a single ESXi
server.
>>>>>>>>
>>>>>>>> About 5-10 minutes into migration a VM onto
the test
>>>>>>>> datastore, /var/log/messages on duke gets
blasted with a
>>>>>>>> ton of messages exactly like this:
>>>>>>>>
>>>>>>>> Mar 15 22:24:06 duke tgtd:
bs_rdwr_request(180) io error
>>>>>>>> 0x1781e00 2a -1 512 22971904, Input/output
error
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> And /var/log/glusterfs/mnt-gluster_disk.log
gets blased
>>>>>>>> with a ton of messages exactly like this:
>>>>>>>>
>>>>>>>> [2015-03-16 02:24:07.572279] W
>>>>>>>> [fuse-bridge.c:2242:fuse_writev_cbk]
0-glusterfs-fuse:
>>>>>>>> 635299: WRITE => -1 (Input/output error)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Are there any messages in the mount log from
AFR about
>>>>>>>> split-brain just before the above line appears?
>>>>>>>> Does `gluster v heal <VOLNAME> info` show
any files? Performing
>>>>>>>> I/O on files that are in split-brain fail with
EIO.
>>>>>>>>
>>>>>>>> -Ravi
>>>>>>>>
>>>>>>>> And the write operation from VMware's
side fails as soon as
>>>>>>>> these messages start.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I don't see any other errors (in the
log files I know of)
>>>>>>>> indicating the root cause of these i/o
errors. I'm sure
>>>>>>>> that this is not enough information to tell
what's going
>>>>>>>> on, but can anyone help me figure out what
to look at next
>>>>>>>> to figure this out?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I've also considered using Dan
Lambright's libgfapi gluster
>>>>>>>> module for tgtd (or something similar) to
avoid going
>>>>>>>> through FUSE, but I'm not sure whether
that would be
>>>>>>>> irrelevant to this problem, since I'm
not 100% sure if it
>>>>>>>> lies in FUSE or elsewhere.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> /Jon Heese/
>>>>>>>> /Systems Engineer/
>>>>>>>> *INetU Managed Hosting*
>>>>>>>> P: 610.266.7441 x 261
>>>>>>>> F: 610.266.7434
>>>>>>>> www.inetu.net
<https://www.inetu.net/>
>>>>>>>>
>>>>>>>> /** This message contains confidential
information, which
>>>>>>>> also may be privileged, and is intended
only for the
>>>>>>>> person(s) addressed above. Any unauthorized
use,
>>>>>>>> distribution, copying or disclosure of
confidential and/or
>>>>>>>> privileged information is strictly
prohibited. If you have
>>>>>>>> received this communication in error,
please erase all
>>>>>>>> copies of the message and its attachments
and notify the
>>>>>>>> sender immediately via reply e-mail. **/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
_______________________________________________
>>>>>>>>
>>>>>>>> Gluster-users mailing list
>>>>>>>>
>>>>>>>> Gluster-users at gluster.org
<mailto:Gluster-users at gluster.org>
>>>>>>>>
>>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users at gluster.org
>>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150327/c0f7aa6e/attachment.html>