thr3ads.net - Gluster users - [Gluster-users] Issue with Pro active self healing for Erasure coding [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Xavier Hernandez

2015-May-27 14:32 UTC

[Gluster-users] Issue with Pro active self healing for Erasure coding

Hi again,

in today's gluster meeting [1] it has been decided that 3.7.1 will be 
released urgently to solve a bug in glusterd. All fixes planned for 
3.7.1 will be moved to 3.7.2 which will be released soon after.

Xavi

[1] 
http://meetbot.fedoraproject.org/gluster-meeting/2015-05-27/gluster-meeting.2015-05-27-12.01.html

On 05/27/2015 12:01 PM, Xavier Hernandez wrote:> On 05/27/2015 11:26 AM, Mohamed Pakkeer wrote:
>> Hi Xavier,
>>
>> Thanks for your reply. When can we expect the 3.7.1 release?
>
> AFAIK a beta of 3.7.1 will be released very soon.
>
>>
>> cheers
>> Backer
>>
>> On Wed, May 27, 2015 at 1:22 PM, Xavier Hernandez <xhernandez at
datalab.es
>> <mailto:xhernandez at datalab.es>> wrote:
>>
>>     Hi,
>>
>>     some Input/Output error issues have been identified and fixed.
These
>>     fixes will be available on 3.7.1.
>>
>>     Xavi
>>
>>
>>     On 05/26/2015 10:15 AM, Mohamed Pakkeer wrote:
>>
>>         Hi Glusterfs Experts,
>>
>>         We are testing glusterfs 3.7.0 tarball on our 10 Node glusterfs
>>         cluster.
>>         Each node has 36 dirves and please find the volume info below
>>
>>         Volume Name: vaulttest5
>>         Type: Distributed-Disperse
>>         Volume ID: 68e082a6-9819-4885-856c-1510cd201bd9
>>         Status: Started
>>         Number of Bricks: 36 x (8 + 2) = 360
>>         Transport-type: tcp
>>         Bricks:
>>         Brick1: 10.1.2.1:/media/disk1
>>         Brick2: 10.1.2.2:/media/disk1
>>         Brick3: 10.1.2.3:/media/disk1
>>         Brick4: 10.1.2.4:/media/disk1
>>         Brick5: 10.1.2.5:/media/disk1
>>         Brick6: 10.1.2.6:/media/disk1
>>         Brick7: 10.1.2.7:/media/disk1
>>         Brick8: 10.1.2.8:/media/disk1
>>         Brick9: 10.1.2.9:/media/disk1
>>         Brick10: 10.1.2.10:/media/disk1
>>         Brick11: 10.1.2.1:/media/disk2
>>         Brick12: 10.1.2.2:/media/disk2
>>         Brick13: 10.1.2.3:/media/disk2
>>         Brick14: 10.1.2.4:/media/disk2
>>         Brick15: 10.1.2.5:/media/disk2
>>         Brick16: 10.1.2.6:/media/disk2
>>         Brick17: 10.1.2.7:/media/disk2
>>         Brick18: 10.1.2.8:/media/disk2
>>         Brick19: 10.1.2.9:/media/disk2
>>         Brick20: 10.1.2.10:/media/disk2
>>         ...
>>         ....
>>         Brick351: 10.1.2.1:/media/disk36
>>         Brick352: 10.1.2.2:/media/disk36
>>         Brick353: 10.1.2.3:/media/disk36
>>         Brick354: 10.1.2.4:/media/disk36
>>         Brick355: 10.1.2.5:/media/disk36
>>         Brick356: 10.1.2.6:/media/disk36
>>         Brick357: 10.1.2.7:/media/disk36
>>         Brick358: 10.1.2.8:/media/disk36
>>         Brick359: 10.1.2.9:/media/disk36
>>         Brick360: 10.1.2.10:/media/disk36
>>         Options Reconfigured:
>>         performance.readdir-ahead: on
>>
>>         We did some performance testing and simulated the proactive
self
>>         healing
>>         for Erasure coding. Disperse volume has been created across
>> nodes.
>>
>>         _*Description of problem*_
>>
>>         I disconnected the *network of two nodes* and tried to write
>>         some video
>>         files and *glusterfs* *wrote the video files on balance 8 nodes
>>         perfectly*. I tried to download the uploaded file and it was
>>         downloaded
>>         perfectly. Then i enabled the network of two nodes, the pro
>>         active self
>>         healing mechanism worked perfectly and wrote the unavailable
>> junk of
>>         data to the recently enabled node from the other 8 nodes. But
>> when i
>>         tried to download the same file node, it showed Input/Output
>>         error. I
>>         couldn't download the file. I think there is an issue in
pro
>>         active self
>>         healing.
>>
>>         Also we tried the simulation with one node network failure. We
>> faced
>>         same I/O error issue while downloading the file
>>
>>
>>         _Error while downloading file _
>>         _
>>         _
>>
>>         root at master02:/home/admin# rsync -r --progress
>>         /mnt/gluster/file13_AN
>>         ./1/file13_AN-2
>>
>>         sending incremental file list
>>
>>         file13_AN
>>
>>             3,342,355,597 100% 4.87MB/s    0:10:54 (xfr#1, to-chk=0/1)
>>
>>         rsync: read errors mapping "/mnt/gluster/file13_AN":
>>         Input/output error (5)
>>
>>         WARNING: file13_AN failed verification -- update discarded
(will
>>         try again).
>>
>>            root at master02:/home/admin# cp /mnt/gluster/file13_AN
>>         ./1/file13_AN-3
>>
>>         cp: error reading ?/mnt/gluster/file13_AN?: Input/output error
>>
>>         cp: failed to extend ?./1/file13_AN-3?: Input/output error_
>>         _
>>
>>
>>         We can't conclude the issue with glusterfs 3.7.0 or our
glusterfs
>>         configuration.
>>
>>         Any help would be greatly appreciated
>>
>>         --
>>         Cheers
>>         Backer
>>
>>
>>
>>         _______________________________________________
>>         Gluster-users mailing list
>>         Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>>         http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>
>>
>>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

Mohamed Pakkeer

2015-Jun-15 07:25 UTC

head link

[Gluster-users] Issue with Pro active self healing for Erasure coding

Hi Xavier,

When can we expect the 3.7.2 release for fixing the I/O error which we
discussed on this mail thread?.

Thanks
Backer

On Wed, May 27, 2015 at 8:02 PM, Xavier Hernandez <xhernandez at
datalab.es>
wrote:
> Hi again,
>
> in today's gluster meeting [1] it has been decided that 3.7.1 will be
> released urgently to solve a bug in glusterd. All fixes planned for 3.7.1
> will be moved to 3.7.2 which will be released soon after.
>
> Xavi
>
> [1]
>
http://meetbot.fedoraproject.org/gluster-meeting/2015-05-27/gluster-meeting.2015-05-27-12.01.html
>
>
> On 05/27/2015 12:01 PM, Xavier Hernandez wrote:
>
>> On 05/27/2015 11:26 AM, Mohamed Pakkeer wrote:
>>
>>> Hi Xavier,
>>>
>>> Thanks for your reply. When can we expect the 3.7.1 release?
>>>
>>
>> AFAIK a beta of 3.7.1 will be released very soon.
>>
>>
>>> cheers
>>> Backer
>>>
>>> On Wed, May 27, 2015 at 1:22 PM, Xavier Hernandez <xhernandez at
datalab.es
>>> <mailto:xhernandez at datalab.es>> wrote:
>>>
>>>     Hi,
>>>
>>>     some Input/Output error issues have been identified and fixed.
These
>>>     fixes will be available on 3.7.1.
>>>
>>>     Xavi
>>>
>>>
>>>     On 05/26/2015 10:15 AM, Mohamed Pakkeer wrote:
>>>
>>>         Hi Glusterfs Experts,
>>>
>>>         We are testing glusterfs 3.7.0 tarball on our 10 Node
glusterfs
>>>         cluster.
>>>         Each node has 36 dirves and please find the volume info
below
>>>
>>>         Volume Name: vaulttest5
>>>         Type: Distributed-Disperse
>>>         Volume ID: 68e082a6-9819-4885-856c-1510cd201bd9
>>>         Status: Started
>>>         Number of Bricks: 36 x (8 + 2) = 360
>>>         Transport-type: tcp
>>>         Bricks:
>>>         Brick1: 10.1.2.1:/media/disk1
>>>         Brick2: 10.1.2.2:/media/disk1
>>>         Brick3: 10.1.2.3:/media/disk1
>>>         Brick4: 10.1.2.4:/media/disk1
>>>         Brick5: 10.1.2.5:/media/disk1
>>>         Brick6: 10.1.2.6:/media/disk1
>>>         Brick7: 10.1.2.7:/media/disk1
>>>         Brick8: 10.1.2.8:/media/disk1
>>>         Brick9: 10.1.2.9:/media/disk1
>>>         Brick10: 10.1.2.10:/media/disk1
>>>         Brick11: 10.1.2.1:/media/disk2
>>>         Brick12: 10.1.2.2:/media/disk2
>>>         Brick13: 10.1.2.3:/media/disk2
>>>         Brick14: 10.1.2.4:/media/disk2
>>>         Brick15: 10.1.2.5:/media/disk2
>>>         Brick16: 10.1.2.6:/media/disk2
>>>         Brick17: 10.1.2.7:/media/disk2
>>>         Brick18: 10.1.2.8:/media/disk2
>>>         Brick19: 10.1.2.9:/media/disk2
>>>         Brick20: 10.1.2.10:/media/disk2
>>>         ...
>>>         ....
>>>         Brick351: 10.1.2.1:/media/disk36
>>>         Brick352: 10.1.2.2:/media/disk36
>>>         Brick353: 10.1.2.3:/media/disk36
>>>         Brick354: 10.1.2.4:/media/disk36
>>>         Brick355: 10.1.2.5:/media/disk36
>>>         Brick356: 10.1.2.6:/media/disk36
>>>         Brick357: 10.1.2.7:/media/disk36
>>>         Brick358: 10.1.2.8:/media/disk36
>>>         Brick359: 10.1.2.9:/media/disk36
>>>         Brick360: 10.1.2.10:/media/disk36
>>>         Options Reconfigured:
>>>         performance.readdir-ahead: on
>>>
>>>         We did some performance testing and simulated the proactive
self
>>>         healing
>>>         for Erasure coding. Disperse volume has been created across
>>> nodes.
>>>
>>>         _*Description of problem*_
>>>
>>>         I disconnected the *network of two nodes* and tried to
write
>>>         some video
>>>         files and *glusterfs* *wrote the video files on balance 8
nodes
>>>         perfectly*. I tried to download the uploaded file and it
was
>>>         downloaded
>>>         perfectly. Then i enabled the network of two nodes, the pro
>>>         active self
>>>         healing mechanism worked perfectly and wrote the
unavailable
>>> junk of
>>>         data to the recently enabled node from the other 8 nodes.
But
>>> when i
>>>         tried to download the same file node, it showed
Input/Output
>>>         error. I
>>>         couldn't download the file. I think there is an issue
in pro
>>>         active self
>>>         healing.
>>>
>>>         Also we tried the simulation with one node network failure.
We
>>> faced
>>>         same I/O error issue while downloading the file
>>>
>>>
>>>         _Error while downloading file _
>>>         _
>>>         _
>>>
>>>         root at master02:/home/admin# rsync -r --progress
>>>         /mnt/gluster/file13_AN
>>>         ./1/file13_AN-2
>>>
>>>         sending incremental file list
>>>
>>>         file13_AN
>>>
>>>             3,342,355,597 100% 4.87MB/s    0:10:54 (xfr#1,
to-chk=0/1)
>>>
>>>         rsync: read errors mapping
"/mnt/gluster/file13_AN":
>>>         Input/output error (5)
>>>
>>>         WARNING: file13_AN failed verification -- update discarded
(will
>>>         try again).
>>>
>>>            root at master02:/home/admin# cp /mnt/gluster/file13_AN
>>>         ./1/file13_AN-3
>>>
>>>         cp: error reading ?/mnt/gluster/file13_AN?: Input/output
error
>>>
>>>         cp: failed to extend ?./1/file13_AN-3?: Input/output error_
>>>         _
>>>
>>>
>>>         We can't conclude the issue with glusterfs 3.7.0 or our
glusterfs
>>>         configuration.
>>>
>>>         Any help would be greatly appreciated
>>>
>>>         --
>>>         Cheers
>>>         Backer
>>>
>>>
>>>
>>>         _______________________________________________
>>>         Gluster-users mailing list
>>>         Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>>>         http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>>
>>>
>>>
>>>  _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150615/6f1f6063/attachment.html>

Gluster users - Jun 2015 - Issue with Pro active self healing for Erasure coding

[Gluster-users] Issue with Pro active self healing for Erasure coding

[Gluster-users] Issue with Pro active self healing for Erasure coding