thr3ads.net - Gluster users - [Gluster-users] I/O error on replicated volume [Mar 2015]

If this information is useful, please help other people find it:
Share via:

Jonathan Heese

2015-Mar-16 20:44 UTC

[Gluster-users] I/O error on replicated volume

Hello,

So I resolved my previous issue with split-brains and the lack of self-healing
by dropping my installed glusterfs* packages from 3.6.2 to 3.5.3, but now
I've picked up a new issue, which actually makes normal use of the volume
practically impossible.

A little background for those not already paying close attention:
I have a 2 node 2 brick replicating volume whose purpose in life is to hold
iSCSI target files, primarily for use to provide datastores to a VMware ESXi
cluster.  The plan is to put a handful of image files on the Gluster volume,
mount them locally on both Gluster nodes, and run tgtd on both, pointed to the
image files on the mounted gluster volume. Then the ESXi boxes will use
multipath (active/passive) iSCSI to connect to the nodes, with automatic
failover in case of planned or unplanned downtime of the Gluster nodes.

In my most recent round of testing with 3.5.3, I'm seeing a massive failure
to write data to the volume after about 5-10 minutes, so I've simplified the
scenario a bit (to minimize the variables) to: both Gluster nodes up, only one
node (duke) mounted and running tgtd, and just regular (single path) iSCSI from
a single ESXi server.

About 5-10 minutes into migration a VM onto the test datastore,
/var/log/messages on duke gets blasted with a ton of messages exactly like this:
Mar 15 22:24:06 duke tgtd: bs_rdwr_request(180) io error 0x1781e00 2a -1 512
22971904, Input/output error

And /var/log/glusterfs/mnt-gluster_disk.log gets blased with a ton of messages
exactly like this:
[2015-03-16 02:24:07.572279] W [fuse-bridge.c:2242:fuse_writev_cbk]
0-glusterfs-fuse: 635299: WRITE => -1 (Input/output error)

And the write operation from VMware's side fails as soon as these messages
start.

I don't see any other errors (in the log files I know of) indicating the
root cause of these i/o errors.  I'm sure that this is not enough
information to tell what's going on, but can anyone help me figure out what
to look at next to figure this out?

I've also considered using Dan Lambright's libgfapi gluster module for
tgtd (or something similar) to avoid going through FUSE, but I'm not sure
whether that would be irrelevant to this problem, since I'm not 100% sure if
it lies in FUSE or elsewhere.

Thanks!

Jon Heese
Systems Engineer
INetU Managed Hosting
P: 610.266.7441 x 261
F: 610.266.7434
www.inetu.net<https://www.inetu.net/>
** This message contains confidential information, which also may be privileged,
and is intended only for the person(s) addressed above. Any unauthorized use,
distribution, copying or disclosure of confidential and/or privileged
information is strictly prohibited. If you have received this communication in
error, please erase all copies of the message and its attachments and notify the
sender immediately via reply e-mail. **

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150316/68d109cd/attachment.html>

Ravishankar N

2015-Mar-17 04:35 UTC

head link

[Gluster-users] I/O error on replicated volume

On 03/17/2015 02:14 AM, Jonathan Heese wrote:> Hello,
>
> So I resolved my previous issue with split-brains and the lack of 
> self-healing by dropping my installed glusterfs* packages from 3.6.2 
> to 3.5.3, but now I've picked up a new issue, which actually makes 
> normal use of the volume practically impossible.
>
> A little background for those not already paying close attention:
> I have a 2 node 2 brick replicating volume whose purpose in life is to 
> hold iSCSI target files, primarily for use to provide datastores to a 
> VMware ESXi cluster.  The plan is to put a handful of image files on 
> the Gluster volume, mount them locally on both Gluster nodes, and run 
> tgtd on both, pointed to the image files on the mounted gluster 
> volume. Then the ESXi boxes will use multipath (active/passive) iSCSI 
> to connect to the nodes, with automatic failover in case of planned or 
> unplanned downtime of the Gluster nodes.
>
> In my most recent round of testing with 3.5.3, I'm seeing a massive 
> failure to write data to the volume after about 5-10 minutes, so I've 
> simplified the scenario a bit (to minimize the variables) to: both 
> Gluster nodes up, only one node (duke) mounted and running tgtd, and 
> just regular (single path) iSCSI from a single ESXi server.
>
> About 5-10 minutes into migration a VM onto the test datastore, 
> /var/log/messages on duke gets blasted with a ton of messages exactly 
> like this:
>
> Mar 15 22:24:06 duke tgtd: bs_rdwr_request(180) io error 0x1781e00 2a 
> -1 512 22971904, Input/output error
>
>
> And /var/log/glusterfs/mnt-gluster_disk.log gets blased with a ton of 
> messages exactly like this:
>
> [2015-03-16 02:24:07.572279] W [fuse-bridge.c:2242:fuse_writev_cbk] 
> 0-glusterfs-fuse: 635299: WRITE => -1 (Input/output error)
>
>
Are there any messages in the mount log from AFR about split-brain just 
before the above line appears?
Does `gluster v heal <VOLNAME> info` show any files? Performing I/O on 
files that are in split-brain fail with EIO.

-Ravi
> And the write operation from VMware's side fails as soon as these 
> messages start.
>
>
> I don't see any other errors (in the log files I know of) indicating 
> the root cause of these i/o errors.  I'm sure that this is not enough 
> information to tell what's going on, but can anyone help me figure out 
> what to look at next to figure this out?
>
>
> I've also considered using Dan Lambright's libgfapi gluster module
for
> tgtd (or something similar) to avoid going through FUSE, but I'm not 
> sure whether that would be irrelevant to this problem, since I'm not 
> 100% sure if it lies in FUSE or elsewhere.
>
>
> Thanks!
>
>
> /Jon Heese/
> /Systems Engineer/
> *INetU Managed Hosting*
> P: 610.266.7441 x 261
> F: 610.266.7434
> www.inetu.net<https://www.inetu.net/>
>
> /** This message contains confidential information, which also may be 
> privileged, and is intended only for the person(s) addressed above. 
> Any unauthorized use, distribution, copying or disclosure of 
> confidential and/or privileged information is strictly prohibited. If 
> you have received this communication in error, please erase all copies 
> of the message and its attachments and notify the sender immediately 
> via reply e-mail. **/
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150317/d7c640a2/attachment.html>

Gluster users - Mar 2015 - I/O error on replicated volume

[Gluster-users] I/O error on replicated volume

[Gluster-users] I/O error on replicated volume