Mohamed Pakkeer
2015-May-26 08:15 UTC
[Gluster-users] Issue with Pro active self healing for Erasure coding
Hi Glusterfs Experts, We are testing glusterfs 3.7.0 tarball on our 10 Node glusterfs cluster. Each node has 36 dirves and please find the volume info below Volume Name: vaulttest5 Type: Distributed-Disperse Volume ID: 68e082a6-9819-4885-856c-1510cd201bd9 Status: Started Number of Bricks: 36 x (8 + 2) = 360 Transport-type: tcp Bricks: Brick1: 10.1.2.1:/media/disk1 Brick2: 10.1.2.2:/media/disk1 Brick3: 10.1.2.3:/media/disk1 Brick4: 10.1.2.4:/media/disk1 Brick5: 10.1.2.5:/media/disk1 Brick6: 10.1.2.6:/media/disk1 Brick7: 10.1.2.7:/media/disk1 Brick8: 10.1.2.8:/media/disk1 Brick9: 10.1.2.9:/media/disk1 Brick10: 10.1.2.10:/media/disk1 Brick11: 10.1.2.1:/media/disk2 Brick12: 10.1.2.2:/media/disk2 Brick13: 10.1.2.3:/media/disk2 Brick14: 10.1.2.4:/media/disk2 Brick15: 10.1.2.5:/media/disk2 Brick16: 10.1.2.6:/media/disk2 Brick17: 10.1.2.7:/media/disk2 Brick18: 10.1.2.8:/media/disk2 Brick19: 10.1.2.9:/media/disk2 Brick20: 10.1.2.10:/media/disk2 ... .... Brick351: 10.1.2.1:/media/disk36 Brick352: 10.1.2.2:/media/disk36 Brick353: 10.1.2.3:/media/disk36 Brick354: 10.1.2.4:/media/disk36 Brick355: 10.1.2.5:/media/disk36 Brick356: 10.1.2.6:/media/disk36 Brick357: 10.1.2.7:/media/disk36 Brick358: 10.1.2.8:/media/disk36 Brick359: 10.1.2.9:/media/disk36 Brick360: 10.1.2.10:/media/disk36 Options Reconfigured: performance.readdir-ahead: on We did some performance testing and simulated the proactive self healing for Erasure coding. Disperse volume has been created across nodes. *Description of problem* I disconnected the *network of two nodes* and tried to write some video files and *glusterfs* *wrote the video files on balance 8 nodes perfectly*. I tried to download the uploaded file and it was downloaded perfectly. Then i enabled the network of two nodes, the pro active self healing mechanism worked perfectly and wrote the unavailable junk of data to the recently enabled node from the other 8 nodes. But when i tried to download the same file node, it showed Input/Output error. I couldn't download the file. I think there is an issue in pro active self healing. Also we tried the simulation with one node network failure. We faced same I/O error issue while downloading the file *Error while downloading file * root at master02:/home/admin# rsync -r --progress /mnt/gluster/file13_AN ./1/file13_AN-2 sending incremental file list file13_AN 3,342,355,597 100% 4.87MB/s 0:10:54 (xfr#1, to-chk=0/1) rsync: read errors mapping "/mnt/gluster/file13_AN": Input/output error (5) WARNING: file13_AN failed verification -- update discarded (will try again). root at master02:/home/admin# cp /mnt/gluster/file13_AN ./1/file13_AN-3 cp: error reading ?/mnt/gluster/file13_AN?: Input/output error cp: failed to extend ?./1/file13_AN-3?: Input/output error We can't conclude the issue with glusterfs 3.7.0 or our glusterfs configuration. Any help would be greatly appreciated -- Cheers Backer -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150526/c3f3384e/attachment.html>
Xavier Hernandez
2015-May-27 07:52 UTC
[Gluster-users] Issue with Pro active self healing for Erasure coding
Hi, some Input/Output error issues have been identified and fixed. These fixes will be available on 3.7.1. Xavi On 05/26/2015 10:15 AM, Mohamed Pakkeer wrote:> Hi Glusterfs Experts, > > We are testing glusterfs 3.7.0 tarball on our 10 Node glusterfs cluster. > Each node has 36 dirves and please find the volume info below > > Volume Name: vaulttest5 > Type: Distributed-Disperse > Volume ID: 68e082a6-9819-4885-856c-1510cd201bd9 > Status: Started > Number of Bricks: 36 x (8 + 2) = 360 > Transport-type: tcp > Bricks: > Brick1: 10.1.2.1:/media/disk1 > Brick2: 10.1.2.2:/media/disk1 > Brick3: 10.1.2.3:/media/disk1 > Brick4: 10.1.2.4:/media/disk1 > Brick5: 10.1.2.5:/media/disk1 > Brick6: 10.1.2.6:/media/disk1 > Brick7: 10.1.2.7:/media/disk1 > Brick8: 10.1.2.8:/media/disk1 > Brick9: 10.1.2.9:/media/disk1 > Brick10: 10.1.2.10:/media/disk1 > Brick11: 10.1.2.1:/media/disk2 > Brick12: 10.1.2.2:/media/disk2 > Brick13: 10.1.2.3:/media/disk2 > Brick14: 10.1.2.4:/media/disk2 > Brick15: 10.1.2.5:/media/disk2 > Brick16: 10.1.2.6:/media/disk2 > Brick17: 10.1.2.7:/media/disk2 > Brick18: 10.1.2.8:/media/disk2 > Brick19: 10.1.2.9:/media/disk2 > Brick20: 10.1.2.10:/media/disk2 > ... > .... > Brick351: 10.1.2.1:/media/disk36 > Brick352: 10.1.2.2:/media/disk36 > Brick353: 10.1.2.3:/media/disk36 > Brick354: 10.1.2.4:/media/disk36 > Brick355: 10.1.2.5:/media/disk36 > Brick356: 10.1.2.6:/media/disk36 > Brick357: 10.1.2.7:/media/disk36 > Brick358: 10.1.2.8:/media/disk36 > Brick359: 10.1.2.9:/media/disk36 > Brick360: 10.1.2.10:/media/disk36 > Options Reconfigured: > performance.readdir-ahead: on > > We did some performance testing and simulated the proactive self healing > for Erasure coding. Disperse volume has been created across nodes. > > _*Description of problem*_ > > I disconnected the *network of two nodes* and tried to write some video > files and *glusterfs* *wrote the video files on balance 8 nodes > perfectly*. I tried to download the uploaded file and it was downloaded > perfectly. Then i enabled the network of two nodes, the pro active self > healing mechanism worked perfectly and wrote the unavailable junk of > data to the recently enabled node from the other 8 nodes. But when i > tried to download the same file node, it showed Input/Output error. I > couldn't download the file. I think there is an issue in pro active self > healing. > > Also we tried the simulation with one node network failure. We faced > same I/O error issue while downloading the file > > > _Error while downloading file _ > _ > _ > > root at master02:/home/admin# rsync -r --progress /mnt/gluster/file13_AN > ./1/file13_AN-2 > > sending incremental file list > > file13_AN > > 3,342,355,597 100% 4.87MB/s 0:10:54 (xfr#1, to-chk=0/1) > > rsync: read errors mapping "/mnt/gluster/file13_AN": Input/output error (5) > > WARNING: file13_AN failed verification -- update discarded (will try again). > > root at master02:/home/admin# cp /mnt/gluster/file13_AN ./1/file13_AN-3 > > cp: error reading ?/mnt/gluster/file13_AN?: Input/output error > > cp: failed to extend ?./1/file13_AN-3?: Input/output error_ > _ > > > We can't conclude the issue with glusterfs 3.7.0 or our glusterfs > configuration. > > Any help would be greatly appreciated > > -- > Cheers > Backer > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >