thr3ads.net - Gluster users - [Gluster-users] Issue with Pro active self healing for Erasure coding [May 2015]

If this information is useful, please help other people find it:
Share via:

Mohamed Pakkeer

2015-May-26 08:15 UTC

[Gluster-users] Issue with Pro active self healing for Erasure coding

Hi Glusterfs Experts,

We are testing glusterfs 3.7.0 tarball on our 10 Node glusterfs cluster.
Each node has 36 dirves and please find the volume info below

Volume Name: vaulttest5
Type: Distributed-Disperse
Volume ID: 68e082a6-9819-4885-856c-1510cd201bd9
Status: Started
Number of Bricks: 36 x (8 + 2) = 360
Transport-type: tcp
Bricks:
Brick1: 10.1.2.1:/media/disk1
Brick2: 10.1.2.2:/media/disk1
Brick3: 10.1.2.3:/media/disk1
Brick4: 10.1.2.4:/media/disk1
Brick5: 10.1.2.5:/media/disk1
Brick6: 10.1.2.6:/media/disk1
Brick7: 10.1.2.7:/media/disk1
Brick8: 10.1.2.8:/media/disk1
Brick9: 10.1.2.9:/media/disk1
Brick10: 10.1.2.10:/media/disk1
Brick11: 10.1.2.1:/media/disk2
Brick12: 10.1.2.2:/media/disk2
Brick13: 10.1.2.3:/media/disk2
Brick14: 10.1.2.4:/media/disk2
Brick15: 10.1.2.5:/media/disk2
Brick16: 10.1.2.6:/media/disk2
Brick17: 10.1.2.7:/media/disk2
Brick18: 10.1.2.8:/media/disk2
Brick19: 10.1.2.9:/media/disk2
Brick20: 10.1.2.10:/media/disk2
...
....
Brick351: 10.1.2.1:/media/disk36
Brick352: 10.1.2.2:/media/disk36
Brick353: 10.1.2.3:/media/disk36
Brick354: 10.1.2.4:/media/disk36
Brick355: 10.1.2.5:/media/disk36
Brick356: 10.1.2.6:/media/disk36
Brick357: 10.1.2.7:/media/disk36
Brick358: 10.1.2.8:/media/disk36
Brick359: 10.1.2.9:/media/disk36
Brick360: 10.1.2.10:/media/disk36
Options Reconfigured:
performance.readdir-ahead: on

We did some performance testing and simulated the proactive self healing
for Erasure coding. Disperse volume has been created across nodes.

*Description of problem*

I disconnected the *network of two nodes* and tried to write some video
files and *glusterfs* *wrote the video files on balance 8 nodes perfectly*.
I tried to download the uploaded file and it was downloaded perfectly. Then
i enabled the network of two nodes, the pro active self healing mechanism
worked perfectly and wrote the unavailable junk of data to the recently
enabled node from the other 8 nodes. But when i tried to download the same
file node, it showed Input/Output error. I couldn't download the file. I
think there is an issue in pro active self healing.

Also we tried the simulation with one node network failure. We faced same
I/O error issue while downloading the file


*Error while downloading file *

root at master02:/home/admin# rsync -r --progress /mnt/gluster/file13_AN
./1/file13_AN-2

sending incremental file list

file13_AN

  3,342,355,597 100%    4.87MB/s    0:10:54 (xfr#1, to-chk=0/1)

rsync: read errors mapping "/mnt/gluster/file13_AN": Input/output
error (5)

WARNING: file13_AN failed verification -- update discarded (will try again).



 root at master02:/home/admin# cp /mnt/gluster/file13_AN ./1/file13_AN-3

cp: error reading ?/mnt/gluster/file13_AN?: Input/output error
cp: failed to extend ?./1/file13_AN-3?: Input/output error


We can't conclude the issue with glusterfs 3.7.0 or our glusterfs
configuration.

Any help would be greatly appreciated

-- 
Cheers
Backer
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150526/c3f3384e/attachment.html>

Xavier Hernandez

2015-May-27 07:52 UTC

head link

[Gluster-users] Issue with Pro active self healing for Erasure coding

Hi,

some Input/Output error issues have been identified and fixed. These 
fixes will be available on 3.7.1.

Xavi

On 05/26/2015 10:15 AM, Mohamed Pakkeer wrote:> Hi Glusterfs Experts,
>
> We are testing glusterfs 3.7.0 tarball on our 10 Node glusterfs cluster.
> Each node has 36 dirves and please find the volume info below
>
> Volume Name: vaulttest5
> Type: Distributed-Disperse
> Volume ID: 68e082a6-9819-4885-856c-1510cd201bd9
> Status: Started
> Number of Bricks: 36 x (8 + 2) = 360
> Transport-type: tcp
> Bricks:
> Brick1: 10.1.2.1:/media/disk1
> Brick2: 10.1.2.2:/media/disk1
> Brick3: 10.1.2.3:/media/disk1
> Brick4: 10.1.2.4:/media/disk1
> Brick5: 10.1.2.5:/media/disk1
> Brick6: 10.1.2.6:/media/disk1
> Brick7: 10.1.2.7:/media/disk1
> Brick8: 10.1.2.8:/media/disk1
> Brick9: 10.1.2.9:/media/disk1
> Brick10: 10.1.2.10:/media/disk1
> Brick11: 10.1.2.1:/media/disk2
> Brick12: 10.1.2.2:/media/disk2
> Brick13: 10.1.2.3:/media/disk2
> Brick14: 10.1.2.4:/media/disk2
> Brick15: 10.1.2.5:/media/disk2
> Brick16: 10.1.2.6:/media/disk2
> Brick17: 10.1.2.7:/media/disk2
> Brick18: 10.1.2.8:/media/disk2
> Brick19: 10.1.2.9:/media/disk2
> Brick20: 10.1.2.10:/media/disk2
> ...
> ....
> Brick351: 10.1.2.1:/media/disk36
> Brick352: 10.1.2.2:/media/disk36
> Brick353: 10.1.2.3:/media/disk36
> Brick354: 10.1.2.4:/media/disk36
> Brick355: 10.1.2.5:/media/disk36
> Brick356: 10.1.2.6:/media/disk36
> Brick357: 10.1.2.7:/media/disk36
> Brick358: 10.1.2.8:/media/disk36
> Brick359: 10.1.2.9:/media/disk36
> Brick360: 10.1.2.10:/media/disk36
> Options Reconfigured:
> performance.readdir-ahead: on
>
> We did some performance testing and simulated the proactive self healing
> for Erasure coding. Disperse volume has been created across nodes.
>
> _*Description of problem*_
>
> I disconnected the *network of two nodes* and tried to write some video
> files and *glusterfs* *wrote the video files on balance 8 nodes
> perfectly*. I tried to download the uploaded file and it was downloaded
> perfectly. Then i enabled the network of two nodes, the pro active self
> healing mechanism worked perfectly and wrote the unavailable junk of
> data to the recently enabled node from the other 8 nodes. But when i
> tried to download the same file node, it showed Input/Output error. I
> couldn't download the file. I think there is an issue in pro active
self
> healing.
>
> Also we tried the simulation with one node network failure. We faced
> same I/O error issue while downloading the file
>
>
> _Error while downloading file _
> _
> _
>
> root at master02:/home/admin# rsync -r --progress /mnt/gluster/file13_AN
> ./1/file13_AN-2
>
> sending incremental file list
>
> file13_AN
>
>    3,342,355,597 100% 4.87MB/s    0:10:54 (xfr#1, to-chk=0/1)
>
> rsync: read errors mapping "/mnt/gluster/file13_AN": Input/output
error (5)
>
> WARNING: file13_AN failed verification -- update discarded (will try
again).
>
>   root at master02:/home/admin# cp /mnt/gluster/file13_AN ./1/file13_AN-3
>
> cp: error reading ?/mnt/gluster/file13_AN?: Input/output error
>
> cp: failed to extend ?./1/file13_AN-3?: Input/output error_
> _
>
>
> We can't conclude the issue with glusterfs 3.7.0 or our glusterfs
> configuration.
>
> Any help would be greatly appreciated
>
> --
> Cheers
> Backer
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>

Gluster users - May 2015 - Issue with Pro active self healing for Erasure coding

[Gluster-users] Issue with Pro active self healing for Erasure coding

[Gluster-users] Issue with Pro active self healing for Erasure coding