Xavier Hernandez
2015-May-27 14:32 UTC
[Gluster-users] Issue with Pro active self healing for Erasure coding
Hi again, in today's gluster meeting [1] it has been decided that 3.7.1 will be released urgently to solve a bug in glusterd. All fixes planned for 3.7.1 will be moved to 3.7.2 which will be released soon after. Xavi [1] http://meetbot.fedoraproject.org/gluster-meeting/2015-05-27/gluster-meeting.2015-05-27-12.01.html On 05/27/2015 12:01 PM, Xavier Hernandez wrote:> On 05/27/2015 11:26 AM, Mohamed Pakkeer wrote: >> Hi Xavier, >> >> Thanks for your reply. When can we expect the 3.7.1 release? > > AFAIK a beta of 3.7.1 will be released very soon. > >> >> cheers >> Backer >> >> On Wed, May 27, 2015 at 1:22 PM, Xavier Hernandez <xhernandez at datalab.es >> <mailto:xhernandez at datalab.es>> wrote: >> >> Hi, >> >> some Input/Output error issues have been identified and fixed. These >> fixes will be available on 3.7.1. >> >> Xavi >> >> >> On 05/26/2015 10:15 AM, Mohamed Pakkeer wrote: >> >> Hi Glusterfs Experts, >> >> We are testing glusterfs 3.7.0 tarball on our 10 Node glusterfs >> cluster. >> Each node has 36 dirves and please find the volume info below >> >> Volume Name: vaulttest5 >> Type: Distributed-Disperse >> Volume ID: 68e082a6-9819-4885-856c-1510cd201bd9 >> Status: Started >> Number of Bricks: 36 x (8 + 2) = 360 >> Transport-type: tcp >> Bricks: >> Brick1: 10.1.2.1:/media/disk1 >> Brick2: 10.1.2.2:/media/disk1 >> Brick3: 10.1.2.3:/media/disk1 >> Brick4: 10.1.2.4:/media/disk1 >> Brick5: 10.1.2.5:/media/disk1 >> Brick6: 10.1.2.6:/media/disk1 >> Brick7: 10.1.2.7:/media/disk1 >> Brick8: 10.1.2.8:/media/disk1 >> Brick9: 10.1.2.9:/media/disk1 >> Brick10: 10.1.2.10:/media/disk1 >> Brick11: 10.1.2.1:/media/disk2 >> Brick12: 10.1.2.2:/media/disk2 >> Brick13: 10.1.2.3:/media/disk2 >> Brick14: 10.1.2.4:/media/disk2 >> Brick15: 10.1.2.5:/media/disk2 >> Brick16: 10.1.2.6:/media/disk2 >> Brick17: 10.1.2.7:/media/disk2 >> Brick18: 10.1.2.8:/media/disk2 >> Brick19: 10.1.2.9:/media/disk2 >> Brick20: 10.1.2.10:/media/disk2 >> ... >> .... >> Brick351: 10.1.2.1:/media/disk36 >> Brick352: 10.1.2.2:/media/disk36 >> Brick353: 10.1.2.3:/media/disk36 >> Brick354: 10.1.2.4:/media/disk36 >> Brick355: 10.1.2.5:/media/disk36 >> Brick356: 10.1.2.6:/media/disk36 >> Brick357: 10.1.2.7:/media/disk36 >> Brick358: 10.1.2.8:/media/disk36 >> Brick359: 10.1.2.9:/media/disk36 >> Brick360: 10.1.2.10:/media/disk36 >> Options Reconfigured: >> performance.readdir-ahead: on >> >> We did some performance testing and simulated the proactive self >> healing >> for Erasure coding. Disperse volume has been created across >> nodes. >> >> _*Description of problem*_ >> >> I disconnected the *network of two nodes* and tried to write >> some video >> files and *glusterfs* *wrote the video files on balance 8 nodes >> perfectly*. I tried to download the uploaded file and it was >> downloaded >> perfectly. Then i enabled the network of two nodes, the pro >> active self >> healing mechanism worked perfectly and wrote the unavailable >> junk of >> data to the recently enabled node from the other 8 nodes. But >> when i >> tried to download the same file node, it showed Input/Output >> error. I >> couldn't download the file. I think there is an issue in pro >> active self >> healing. >> >> Also we tried the simulation with one node network failure. We >> faced >> same I/O error issue while downloading the file >> >> >> _Error while downloading file _ >> _ >> _ >> >> root at master02:/home/admin# rsync -r --progress >> /mnt/gluster/file13_AN >> ./1/file13_AN-2 >> >> sending incremental file list >> >> file13_AN >> >> 3,342,355,597 100% 4.87MB/s 0:10:54 (xfr#1, to-chk=0/1) >> >> rsync: read errors mapping "/mnt/gluster/file13_AN": >> Input/output error (5) >> >> WARNING: file13_AN failed verification -- update discarded (will >> try again). >> >> root at master02:/home/admin# cp /mnt/gluster/file13_AN >> ./1/file13_AN-3 >> >> cp: error reading ?/mnt/gluster/file13_AN?: Input/output error >> >> cp: failed to extend ?./1/file13_AN-3?: Input/output error_ >> _ >> >> >> We can't conclude the issue with glusterfs 3.7.0 or our glusterfs >> configuration. >> >> Any help would be greatly appreciated >> >> -- >> Cheers >> Backer >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >> http://www.gluster.org/mailman/listinfo/gluster-users >> >> >> >> >> >> > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users
Mohamed Pakkeer
2015-Jun-15 07:25 UTC
[Gluster-users] Issue with Pro active self healing for Erasure coding
Hi Xavier, When can we expect the 3.7.2 release for fixing the I/O error which we discussed on this mail thread?. Thanks Backer On Wed, May 27, 2015 at 8:02 PM, Xavier Hernandez <xhernandez at datalab.es> wrote:> Hi again, > > in today's gluster meeting [1] it has been decided that 3.7.1 will be > released urgently to solve a bug in glusterd. All fixes planned for 3.7.1 > will be moved to 3.7.2 which will be released soon after. > > Xavi > > [1] > http://meetbot.fedoraproject.org/gluster-meeting/2015-05-27/gluster-meeting.2015-05-27-12.01.html > > > On 05/27/2015 12:01 PM, Xavier Hernandez wrote: > >> On 05/27/2015 11:26 AM, Mohamed Pakkeer wrote: >> >>> Hi Xavier, >>> >>> Thanks for your reply. When can we expect the 3.7.1 release? >>> >> >> AFAIK a beta of 3.7.1 will be released very soon. >> >> >>> cheers >>> Backer >>> >>> On Wed, May 27, 2015 at 1:22 PM, Xavier Hernandez <xhernandez at datalab.es >>> <mailto:xhernandez at datalab.es>> wrote: >>> >>> Hi, >>> >>> some Input/Output error issues have been identified and fixed. These >>> fixes will be available on 3.7.1. >>> >>> Xavi >>> >>> >>> On 05/26/2015 10:15 AM, Mohamed Pakkeer wrote: >>> >>> Hi Glusterfs Experts, >>> >>> We are testing glusterfs 3.7.0 tarball on our 10 Node glusterfs >>> cluster. >>> Each node has 36 dirves and please find the volume info below >>> >>> Volume Name: vaulttest5 >>> Type: Distributed-Disperse >>> Volume ID: 68e082a6-9819-4885-856c-1510cd201bd9 >>> Status: Started >>> Number of Bricks: 36 x (8 + 2) = 360 >>> Transport-type: tcp >>> Bricks: >>> Brick1: 10.1.2.1:/media/disk1 >>> Brick2: 10.1.2.2:/media/disk1 >>> Brick3: 10.1.2.3:/media/disk1 >>> Brick4: 10.1.2.4:/media/disk1 >>> Brick5: 10.1.2.5:/media/disk1 >>> Brick6: 10.1.2.6:/media/disk1 >>> Brick7: 10.1.2.7:/media/disk1 >>> Brick8: 10.1.2.8:/media/disk1 >>> Brick9: 10.1.2.9:/media/disk1 >>> Brick10: 10.1.2.10:/media/disk1 >>> Brick11: 10.1.2.1:/media/disk2 >>> Brick12: 10.1.2.2:/media/disk2 >>> Brick13: 10.1.2.3:/media/disk2 >>> Brick14: 10.1.2.4:/media/disk2 >>> Brick15: 10.1.2.5:/media/disk2 >>> Brick16: 10.1.2.6:/media/disk2 >>> Brick17: 10.1.2.7:/media/disk2 >>> Brick18: 10.1.2.8:/media/disk2 >>> Brick19: 10.1.2.9:/media/disk2 >>> Brick20: 10.1.2.10:/media/disk2 >>> ... >>> .... >>> Brick351: 10.1.2.1:/media/disk36 >>> Brick352: 10.1.2.2:/media/disk36 >>> Brick353: 10.1.2.3:/media/disk36 >>> Brick354: 10.1.2.4:/media/disk36 >>> Brick355: 10.1.2.5:/media/disk36 >>> Brick356: 10.1.2.6:/media/disk36 >>> Brick357: 10.1.2.7:/media/disk36 >>> Brick358: 10.1.2.8:/media/disk36 >>> Brick359: 10.1.2.9:/media/disk36 >>> Brick360: 10.1.2.10:/media/disk36 >>> Options Reconfigured: >>> performance.readdir-ahead: on >>> >>> We did some performance testing and simulated the proactive self >>> healing >>> for Erasure coding. Disperse volume has been created across >>> nodes. >>> >>> _*Description of problem*_ >>> >>> I disconnected the *network of two nodes* and tried to write >>> some video >>> files and *glusterfs* *wrote the video files on balance 8 nodes >>> perfectly*. I tried to download the uploaded file and it was >>> downloaded >>> perfectly. Then i enabled the network of two nodes, the pro >>> active self >>> healing mechanism worked perfectly and wrote the unavailable >>> junk of >>> data to the recently enabled node from the other 8 nodes. But >>> when i >>> tried to download the same file node, it showed Input/Output >>> error. I >>> couldn't download the file. I think there is an issue in pro >>> active self >>> healing. >>> >>> Also we tried the simulation with one node network failure. We >>> faced >>> same I/O error issue while downloading the file >>> >>> >>> _Error while downloading file _ >>> _ >>> _ >>> >>> root at master02:/home/admin# rsync -r --progress >>> /mnt/gluster/file13_AN >>> ./1/file13_AN-2 >>> >>> sending incremental file list >>> >>> file13_AN >>> >>> 3,342,355,597 100% 4.87MB/s 0:10:54 (xfr#1, to-chk=0/1) >>> >>> rsync: read errors mapping "/mnt/gluster/file13_AN": >>> Input/output error (5) >>> >>> WARNING: file13_AN failed verification -- update discarded (will >>> try again). >>> >>> root at master02:/home/admin# cp /mnt/gluster/file13_AN >>> ./1/file13_AN-3 >>> >>> cp: error reading ?/mnt/gluster/file13_AN?: Input/output error >>> >>> cp: failed to extend ?./1/file13_AN-3?: Input/output error_ >>> _ >>> >>> >>> We can't conclude the issue with glusterfs 3.7.0 or our glusterfs >>> configuration. >>> >>> Any help would be greatly appreciated >>> >>> -- >>> Cheers >>> Backer >>> >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >>> http://www.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-users >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150615/6f1f6063/attachment.html>