thr3ads.net - Gluster users - [Gluster-users] Sparse files and heal full bug fix backport to 3.6.x [Feb 2016]

If this information is useful, please help other people find it:
Share via:

Ravishankar N

2016-Feb-10 01:38 UTC

[Gluster-users] Sparse files and heal full bug fix backport to 3.6.x

Hi Steve,
The patch already went in for 3.6.3 
(https://bugzilla.redhat.com/show_bug.cgi?id=1187547). What version are 
you using? If it is 3.6.3 or newer, can you share the logs if this 
happens again? (or possibly try if you can reproduce the issue on your 
setup).
Thanks,
Ravi

On 02/10/2016 02:25 AM, FNU Raghavendra Manjunath wrote:>
> Adding Pranith, maintainer of the replicate feature.
>
>
> Regards,
> Raghavendra
>
>
> On Tue, Feb 9, 2016 at 3:33 PM, Steve Dainard <sdainard at spd1.com 
> <mailto:sdainard at spd1.com>> wrote:
>
>     There is a thread from 2014 mentioning that the heal process on a
>     replica volume was de-sparsing sparse files.(1)
>
>     I've been experiencing the same issue on Gluster 3.6.x. I see there
is
>     a bug closed for a fix on Gluster 3.7 (2) and I'm wondering if this
>     fix can be back-ported to Gluster 3.6.x?
>
>     My experience has been:
>     Replica 3 volume
>     1 brick went offline
>     Brought brick back online
>     Heal full on volume
>     My 500G vm-storage volume went from ~280G used to >400G used.
>
>     I've experienced this a couple times previously, and used fallocate
to
>     re-sparse files but this is cumbersome at best, and lack of proper
>     heal support on sparse files could be disastrous if I didn't have
>     enough free space and ended up crashing my VM's when my storage
domain
>     ran out of space.
>
>     Seeing as 3.6 is still a supported release, and 3.7 feels too bleeding
>     edge for production systems, I think it makes sense to back-port this
>     fix if possible.
>
>     Thanks,
>     Steve
>
>
>
>     1.
>    
https://www.gluster.org/pipermail/gluster-users/2014-November/019512.html
>     2. https://bugzilla.redhat.com/show_bug.cgi?id=1166020
>     _______________________________________________
>     Gluster-users mailing list
>     Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>     http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160210/09b931d5/attachment.html>

Steve Dainard

2016-Feb-10 21:40 UTC

head link

[Gluster-users] Sparse files and heal full bug fix backport to 3.6.x

Most recently this happened on Gluster 3.6.6, I know it happened on
another earlier minor release of 3.6, maybe 3.6.4. Currently on 3.6.8,
I can try to re-create on another replica volume.

Which logs would give some useful info, under which logging level?
>From host with brick down. 2016-02-06 00:40 was approximately when Igot restarted glusterd to get the brick to start properly.
glfsheal-vm-storage.log
...
[2015-11-30 20:37:17.348673] I
[glfs-resolve.c:836:__glfs_active_subvol] 0-vm-storage: switched to
graph 676c7573-7465-7230-312e-706369632e75 (0)
[2016-02-06 00:27:15.282280] E
[client-handshake.c:1496:client_query_portmap_cbk]
0-vm-storage-client-0: failed to get the port number for remote
subvolume. Please run 'gluster volume status' on server to see if
brick process is running.
[2016-02-06 00:27:49.797465] E
[client-handshake.c:1496:client_query_portmap_cbk]
0-vm-storage-client-0: failed to get the port number for remote
subvolume. Please run 'gluster volume status' on server to see if
brick process is running.
[2016-02-06 00:27:54.126627] E
[client-handshake.c:1496:client_query_portmap_cbk]
0-vm-storage-client-0: failed to get the port number for remote
subvolume. Please run 'gluster volume status' on server to see if
brick process is running.
[2016-02-06 00:27:58.449801] E
[client-handshake.c:1496:client_query_portmap_cbk]
0-vm-storage-client-0: failed to get the port number for remote
subvolume. Please run 'gluster volume status' on server to see if
brick process is running.
[2016-02-06 00:31:56.139278] E
[client-handshake.c:1496:client_query_portmap_cbk]
0-vm-storage-client-0: failed to get the port number for remote
subvolume. Please run 'gluster volume status' on server to see if
brick process is running.
<nothing newer in logs>

The brick log, which has a massive amount of these errors
(https://dl.dropboxusercontent.com/u/21916057/mnt-lv-vm-storage-vm-storage.log-20160207.tar.gz):
[2016-02-06 00:43:43.280048] E [socket.c:1972:__socket_read_frag]
0-rpc: wrong MSG-TYPE (1700885605) received from 142.104.230.33:38710
[2016-02-06 00:43:43.280159] E [socket.c:1972:__socket_read_frag]
0-rpc: wrong MSG-TYPE (1700885605) received from 142.104.230.33:38710
[2016-02-06 00:43:43.280325] E [socket.c:1972:__socket_read_frag]
0-rpc: wrong MSG-TYPE (1700885605) received from 142.104.230.33:38710

But I only peer and mount gluster on a private subnet so its a bit
odd.. but I don't know if its related.


On Tue, Feb 9, 2016 at 5:38 PM, Ravishankar N <ravishankar at redhat.com>
wrote:> Hi Steve,
> The patch already went in for 3.6.3
> (https://bugzilla.redhat.com/show_bug.cgi?id=1187547). What version are you
> using? If it is 3.6.3 or newer, can you share the logs if this happens
> again? (or possibly try if you can reproduce the issue on your setup).
> Thanks,
> Ravi
>
>
> On 02/10/2016 02:25 AM, FNU Raghavendra Manjunath wrote:
>
>
> Adding Pranith, maintainer of the replicate feature.
>
>
> Regards,
> Raghavendra
>
>
> On Tue, Feb 9, 2016 at 3:33 PM, Steve Dainard <sdainard at spd1.com>
wrote:
>>
>> There is a thread from 2014 mentioning that the heal process on a
>> replica volume was de-sparsing sparse files.(1)
>>
>> I've been experiencing the same issue on Gluster 3.6.x. I see there
is
>> a bug closed for a fix on Gluster 3.7 (2) and I'm wondering if this
>> fix can be back-ported to Gluster 3.6.x?
>>
>> My experience has been:
>> Replica 3 volume
>> 1 brick went offline
>> Brought brick back online
>> Heal full on volume
>> My 500G vm-storage volume went from ~280G used to >400G used.
>>
>> I've experienced this a couple times previously, and used fallocate
to
>> re-sparse files but this is cumbersome at best, and lack of proper
>> heal support on sparse files could be disastrous if I didn't have
>> enough free space and ended up crashing my VM's when my storage
domain
>> ran out of space.
>>
>> Seeing as 3.6 is still a supported release, and 3.7 feels too bleeding
>> edge for production systems, I think it makes sense to back-port this
>> fix if possible.
>>
>> Thanks,
>> Steve
>>
>>
>>
>> 1.
>>
https://www.gluster.org/pipermail/gluster-users/2014-November/019512.html
>> 2. https://bugzilla.redhat.com/show_bug.cgi?id=1166020
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>

Gluster users - Feb 2016 - Sparse files and heal full bug fix backport to 3.6.x

[Gluster-users] Sparse files and heal full bug fix backport to 3.6.x

[Gluster-users] Sparse files and heal full bug fix backport to 3.6.x