Ravishankar N
2016-Feb-10 01:38 UTC
[Gluster-users] Sparse files and heal full bug fix backport to 3.6.x
Hi Steve, The patch already went in for 3.6.3 (https://bugzilla.redhat.com/show_bug.cgi?id=1187547). What version are you using? If it is 3.6.3 or newer, can you share the logs if this happens again? (or possibly try if you can reproduce the issue on your setup). Thanks, Ravi On 02/10/2016 02:25 AM, FNU Raghavendra Manjunath wrote:> > Adding Pranith, maintainer of the replicate feature. > > > Regards, > Raghavendra > > > On Tue, Feb 9, 2016 at 3:33 PM, Steve Dainard <sdainard at spd1.com > <mailto:sdainard at spd1.com>> wrote: > > There is a thread from 2014 mentioning that the heal process on a > replica volume was de-sparsing sparse files.(1) > > I've been experiencing the same issue on Gluster 3.6.x. I see there is > a bug closed for a fix on Gluster 3.7 (2) and I'm wondering if this > fix can be back-ported to Gluster 3.6.x? > > My experience has been: > Replica 3 volume > 1 brick went offline > Brought brick back online > Heal full on volume > My 500G vm-storage volume went from ~280G used to >400G used. > > I've experienced this a couple times previously, and used fallocate to > re-sparse files but this is cumbersome at best, and lack of proper > heal support on sparse files could be disastrous if I didn't have > enough free space and ended up crashing my VM's when my storage domain > ran out of space. > > Seeing as 3.6 is still a supported release, and 3.7 feels too bleeding > edge for production systems, I think it makes sense to back-port this > fix if possible. > > Thanks, > Steve > > > > 1. > https://www.gluster.org/pipermail/gluster-users/2014-November/019512.html > 2. https://bugzilla.redhat.com/show_bug.cgi?id=1166020 > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > http://www.gluster.org/mailman/listinfo/gluster-users > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160210/09b931d5/attachment.html>
Steve Dainard
2016-Feb-10 21:40 UTC
[Gluster-users] Sparse files and heal full bug fix backport to 3.6.x
Most recently this happened on Gluster 3.6.6, I know it happened on another earlier minor release of 3.6, maybe 3.6.4. Currently on 3.6.8, I can try to re-create on another replica volume. Which logs would give some useful info, under which logging level?>From host with brick down. 2016-02-06 00:40 was approximately when Igot restarted glusterd to get the brick to start properly. glfsheal-vm-storage.log ... [2015-11-30 20:37:17.348673] I [glfs-resolve.c:836:__glfs_active_subvol] 0-vm-storage: switched to graph 676c7573-7465-7230-312e-706369632e75 (0) [2016-02-06 00:27:15.282280] E [client-handshake.c:1496:client_query_portmap_cbk] 0-vm-storage-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2016-02-06 00:27:49.797465] E [client-handshake.c:1496:client_query_portmap_cbk] 0-vm-storage-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2016-02-06 00:27:54.126627] E [client-handshake.c:1496:client_query_portmap_cbk] 0-vm-storage-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2016-02-06 00:27:58.449801] E [client-handshake.c:1496:client_query_portmap_cbk] 0-vm-storage-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2016-02-06 00:31:56.139278] E [client-handshake.c:1496:client_query_portmap_cbk] 0-vm-storage-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. <nothing newer in logs> The brick log, which has a massive amount of these errors (https://dl.dropboxusercontent.com/u/21916057/mnt-lv-vm-storage-vm-storage.log-20160207.tar.gz): [2016-02-06 00:43:43.280048] E [socket.c:1972:__socket_read_frag] 0-rpc: wrong MSG-TYPE (1700885605) received from 142.104.230.33:38710 [2016-02-06 00:43:43.280159] E [socket.c:1972:__socket_read_frag] 0-rpc: wrong MSG-TYPE (1700885605) received from 142.104.230.33:38710 [2016-02-06 00:43:43.280325] E [socket.c:1972:__socket_read_frag] 0-rpc: wrong MSG-TYPE (1700885605) received from 142.104.230.33:38710 But I only peer and mount gluster on a private subnet so its a bit odd.. but I don't know if its related. On Tue, Feb 9, 2016 at 5:38 PM, Ravishankar N <ravishankar at redhat.com> wrote:> Hi Steve, > The patch already went in for 3.6.3 > (https://bugzilla.redhat.com/show_bug.cgi?id=1187547). What version are you > using? If it is 3.6.3 or newer, can you share the logs if this happens > again? (or possibly try if you can reproduce the issue on your setup). > Thanks, > Ravi > > > On 02/10/2016 02:25 AM, FNU Raghavendra Manjunath wrote: > > > Adding Pranith, maintainer of the replicate feature. > > > Regards, > Raghavendra > > > On Tue, Feb 9, 2016 at 3:33 PM, Steve Dainard <sdainard at spd1.com> wrote: >> >> There is a thread from 2014 mentioning that the heal process on a >> replica volume was de-sparsing sparse files.(1) >> >> I've been experiencing the same issue on Gluster 3.6.x. I see there is >> a bug closed for a fix on Gluster 3.7 (2) and I'm wondering if this >> fix can be back-ported to Gluster 3.6.x? >> >> My experience has been: >> Replica 3 volume >> 1 brick went offline >> Brought brick back online >> Heal full on volume >> My 500G vm-storage volume went from ~280G used to >400G used. >> >> I've experienced this a couple times previously, and used fallocate to >> re-sparse files but this is cumbersome at best, and lack of proper >> heal support on sparse files could be disastrous if I didn't have >> enough free space and ended up crashing my VM's when my storage domain >> ran out of space. >> >> Seeing as 3.6 is still a supported release, and 3.7 feels too bleeding >> edge for production systems, I think it makes sense to back-port this >> fix if possible. >> >> Thanks, >> Steve >> >> >> >> 1. >> https://www.gluster.org/pipermail/gluster-users/2014-November/019512.html >> 2. https://bugzilla.redhat.com/show_bug.cgi?id=1166020 >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-users > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users > > >