thr3ads.net - Gluster users - [Gluster-users] GlusterFS problems & alternatives [Feb 2020]

If this information is useful, please help other people find it:
Share via:

Hari Gowtham

2020-Feb-12 07:22 UTC

[Gluster-users] GlusterFS problems & alternatives

Hi,

Stefan, sorry to hear that things are breaking a lot in your cluster,
please file a bug(s) with the necessary information so we can take a look.
If already filed, share it here so we are reminded of it. Fixing broken
cluster state should be easy with Gluster.
There are a few older threads you should be able to find regarding the
same.

Do consider the facts that the devs are limited in bandwidth. we do look at
the issues and are fixing them actively.
We may take some time expecting the community to help each other as well.
If they couldn't resolve it we get in try to sort it out.
FYI: You can see dozens of bugs being worked on even in the past 2 days:
https://review.gluster.org/#/q/status:open+project:glusterfs
And there are other activities happening around as well to make gluster
project healthier. Like Glusto. We are working on this testing framework
to cover as many cases as possible. If you can send out a test case, it
will be beneficial for you as well as the community.

We don't see many people sending out mails that their cluster is healthy
and they are happy (not sure if they think they are spamming.
which they won't be. It helps us understand how well things are going).
Thanks Erik and Strahi, for sharing your experience. It means a lot to us :)
People usually prefer to send a mail when something breaks and that's one
main reason all the threads you read are creating negativity.

Do let us know what is the issue and we will try our best to help you out.

Regards,
Hari.



On Wed, Feb 12, 2020 at 11:58 AM Strahil Nikolov <hunter86_bg at
yahoo.com>
wrote:
> On February 12, 2020 12:28:14 AM GMT+02:00, Erik Jacobson <
> erik.jacobson at hpe.com> wrote:
> >> looking through the last couple of week on this mailing list and
> >reflecting our own experiences, I have to ask: what is the status of
> >GlusterFS? So many people here reporting bugs and no solutions are in
> >sight. GlusterFS clusters break left and right, reboots of a node have
> >become a warrant for instability and broken clusters, no way to fix
> >broken clusters. And all of that with recommended settings, and in our
> >case, enterprise hardware underneath.
> >
> >
> >I have been one of the people asking questions. I sometimes get an
> >answer, which I appreciate. Other times not. But I'm not paying for
> >support in this forum so I appreciate what I can get. My questions
> >are sometimes very hard to summarize and I can't say I've been
offering
> >help as much as I ask. I think I will try to do better.
> >
> >
> >Just to counter with something cool....
> >As we speak now, I'm working on a 2,000 node cluster that will soon
be
> >a
> >5120 node cluster. We're validating it with the newest version of
our
> >cluster manager.
> >
> >It has 12 leader nodes (soon to have 24) that are gluster servers and
> >gnfs servers.
> >
> >I am validating Gluster7.2 (updating from 4.6). Things are looking very
> >good. 5120 nodes using RO NFS root with RW NFS overmounts (for things
> >like /var, /etc, ...)...
> >- boot 1 (where each node creates a RW XFS image on top of NFS for its
> >  writable area then syncs /var, /etc, etc) -- full boot is 15-16
> >  minutes for 2007 nodes.
> >- boot 2 (where the writable area pre-exists and is reused, just
> >  re-rsynced) -- 8-9 minutes to boot 2007 nodes.
> >
> >This is similar to gluster 4, but I think it's saying something to
not
> >have had any failures in this setup on the bleeding edge release level.
> >
> >We also use a different volume shared between the leaders and the head
> >node for shared-storage consoles and system logs. It's working
great.
> >
> >I haven't had time to test other solutions. Our old solution from
SGI
> >days (ICE, ICE X, etc) was a different model where each leader served
> >a set of nodes and NFS-booted 288 or so. No shared storage.
> >
> >Like you, I've wondered if something else matches this solution. We
> >like
> >the shared storage and the ability for a leader to drop and not take
> >288 noes with it.
> >
> >(All nodes running RHEL8.0, Glusterfs 72, CTDB 4.9.1)
> >
> >
> >
> >So we can say gluster is providing the network boot solution for now
> >two
> >supercomputers.
> >
> >
> >
> >Erik
> >________
> >
> >Community Meeting Calendar:
> >
> >APAC Schedule -
> >Every 2nd and 4th Tuesday at 11:30 AM IST
> >Bridge: https://bluejeans.com/441850968
> >
> >NA/EMEA Schedule -
> >Every 1st and 3rd Tuesday at 01:00 PM EDT
> >Bridge: https://bluejeans.com/441850968
> >
> >Gluster-users mailing list
> >Gluster-users at gluster.org
> >https://lists.gluster.org/mailman/listinfo/gluster-users
>
> Hi Stefan,
>
> It seems that devs are not so active in the mailing lists, but based on my
> experience the bugs will be fixed in a reasonable timeframe. I admit that I
> was quite frustrated when my Gluster v6.5  to v6.6 upgrade made my lab
> useless for 2 weeks  and the only help came from oVirt Dev, while
> gluster-users/devel were semi-silent.
> Yet, I'm not paying for any support and I know that any help here is
just
> a good will.
> I hope this has nothing in common with the recent acquisition from IBM,
> but we will see.
>
>
> There is a reason why Red Hat clients are still using Gluster v3 (even
> with backports) - it is the most tested version in Gluster.
> For me Gluster v4+ compared  to v3 is like  Fedora  to RHEL. After all,
> the upstream is not so well tested and Gluster community is taking over
> here - reporting bugs, sharing workarounds, giving advices .
>
> Of course, if you need rock-solid Gluster environment - you definately
> need the enterprise solution with it's 24/7 support.
>
> Keep in mind that even the most expensive storage arrays break after an
> upgrade (it happened 3 times for less than 2 weeks where 2k+ machines were
> read-only,  before the vendor provided a new patch), so the issues in
> Gluster are nothing new  and we should not forget that Gluster is free (and
> doesn't costs millions like some arrays).
> The only mitigation is to thoroughly test each patch on a cluster that
> provides storage for your dev/test clients.
>
> I hope you didn't  understand me wrong - just lower your expectations
->
> even arrays for millions break , so Gluster is not an exclusion , but at
> least it's OpenSource and free.
>
> Best Regards,
> Strahil Nikolov
>
> ________
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/441850968
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
-- 
Regards,
Hari Gowtham.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20200212/10f994c4/attachment.html>

Sunny Kumar

2020-Feb-12 10:54 UTC

head link

[Gluster-users] GlusterFS problems & alternatives

Hi Stefan,

Adding to what Hari said; I want to share a link [1] which talks about
the future releases.
Apart from that, I would suggest you to join community meeting and
share your thoughts.

Hi Strahil,
We have separate version system for "Red Hat's client" and
community,
so it's not Gluster-V3.
We encourage everyone to use the latest release which we believe is
more stable and comes with the latest fixes.

[1] https://lists.gluster.org/pipermail/gluster-devel/2020-January/056779.html


/sunny

On Wed, Feb 12, 2020 at 7:23 AM Hari Gowtham <hgowtham at redhat.com>
wrote:>
> Hi,
>
> Stefan, sorry to hear that things are breaking a lot in your cluster,
please file a bug(s) with the necessary information so we can take a look.
> If already filed, share it here so we are reminded of it. Fixing broken
cluster state should be easy with Gluster.
> There are a few older threads you should be able to find regarding the
same.
>
> Do consider the facts that the devs are limited in bandwidth. we do look at
the issues and are fixing them actively.
> We may take some time expecting the community to help each other as well.
If they couldn't resolve it we get in try to sort it out.
> FYI: You can see dozens of bugs being worked on even in the past 2 days:
https://review.gluster.org/#/q/status:open+project:glusterfs
> And there are other activities happening around as well to make gluster
project healthier. Like Glusto. We are working on this testing framework
> to cover as many cases as possible. If you can send out a test case, it
will be beneficial for you as well as the community.
>
> We don't see many people sending out mails that their cluster is
healthy and they are happy (not sure if they think they are spamming.
> which they won't be. It helps us understand how well things are going).
> Thanks Erik and Strahi, for sharing your experience. It means a lot to us
:)
> People usually prefer to send a mail when something breaks and that's
one main reason all the threads you read are creating negativity.
>
> Do let us know what is the issue and we will try our best to help you out.
>
> Regards,
> Hari.
>
>
>
> On Wed, Feb 12, 2020 at 11:58 AM Strahil Nikolov <hunter86_bg at
yahoo.com> wrote:
>>
>> On February 12, 2020 12:28:14 AM GMT+02:00, Erik Jacobson
<erik.jacobson at hpe.com> wrote:
>> >> looking through the last couple of week on this mailing list
and
>> >reflecting our own experiences, I have to ask: what is the status
of
>> >GlusterFS? So many people here reporting bugs and no solutions are
in
>> >sight. GlusterFS clusters break left and right, reboots of a node
have
>> >become a warrant for instability and broken clusters, no way to fix
>> >broken clusters. And all of that with recommended settings, and in
our
>> >case, enterprise hardware underneath.
>> >
>> >
>> >I have been one of the people asking questions. I sometimes get an
>> >answer, which I appreciate. Other times not. But I'm not paying
for
>> >support in this forum so I appreciate what I can get. My questions
>> >are sometimes very hard to summarize and I can't say I've
been offering
>> >help as much as I ask. I think I will try to do better.
>> >
>> >
>> >Just to counter with something cool....
>> >As we speak now, I'm working on a 2,000 node cluster that will
soon be
>> >a
>> >5120 node cluster. We're validating it with the newest version
of our
>> >cluster manager.
>> >
>> >It has 12 leader nodes (soon to have 24) that are gluster servers
and
>> >gnfs servers.
>> >
>> >I am validating Gluster7.2 (updating from 4.6). Things are looking
very
>> >good. 5120 nodes using RO NFS root with RW NFS overmounts (for
things
>> >like /var, /etc, ...)...
>> >- boot 1 (where each node creates a RW XFS image on top of NFS for
its
>> >  writable area then syncs /var, /etc, etc) -- full boot is 15-16
>> >  minutes for 2007 nodes.
>> >- boot 2 (where the writable area pre-exists and is reused, just
>> >  re-rsynced) -- 8-9 minutes to boot 2007 nodes.
>> >
>> >This is similar to gluster 4, but I think it's saying something
to not
>> >have had any failures in this setup on the bleeding edge release
level.
>> >
>> >We also use a different volume shared between the leaders and the
head
>> >node for shared-storage consoles and system logs. It's working
great.
>> >
>> >I haven't had time to test other solutions. Our old solution
from SGI
>> >days (ICE, ICE X, etc) was a different model where each leader
served
>> >a set of nodes and NFS-booted 288 or so. No shared storage.
>> >
>> >Like you, I've wondered if something else matches this
solution. We
>> >like
>> >the shared storage and the ability for a leader to drop and not
take
>> >288 noes with it.
>> >
>> >(All nodes running RHEL8.0, Glusterfs 72, CTDB 4.9.1)
>> >
>> >
>> >
>> >So we can say gluster is providing the network boot solution for
now
>> >two
>> >supercomputers.
>> >
>> >
>> >
>> >Erik
>> >________
>> >
>> >Community Meeting Calendar:
>> >
>> >APAC Schedule -
>> >Every 2nd and 4th Tuesday at 11:30 AM IST
>> >Bridge: https://bluejeans.com/441850968
>> >
>> >NA/EMEA Schedule -
>> >Every 1st and 3rd Tuesday at 01:00 PM EDT
>> >Bridge: https://bluejeans.com/441850968
>> >
>> >Gluster-users mailing list
>> >Gluster-users at gluster.org
>> >https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>> Hi Stefan,
>>
>> It seems that devs are not so active in the mailing lists, but based on
my experience the bugs will be fixed in a reasonable timeframe. I admit that I
was quite frustrated when my Gluster v6.5  to v6.6 upgrade made my lab useless
for 2 weeks  and the only help came from oVirt Dev, while gluster-users/devel
were semi-silent.
>> Yet, I'm not paying for any support and I know that any help here
is just a good will.
>> I hope this has nothing in common with the recent acquisition from IBM,
but we will see.
>>
>>
>> There is a reason why Red Hat clients are still using Gluster v3 (even
with backports) - it is the most tested version in Gluster.
>> For me Gluster v4+ compared  to v3 is like  Fedora  to RHEL. After all,
the upstream is not so well tested and Gluster community is taking over here -
reporting bugs, sharing workarounds, giving advices .
>>
>> Of course, if you need rock-solid Gluster environment - you definately
need the enterprise solution with it's 24/7 support.
>>
>> Keep in mind that even the most expensive storage arrays break after an
upgrade (it happened 3 times for less than 2 weeks where 2k+ machines were
read-only,  before the vendor provided a new patch), so the issues in Gluster
are nothing new  and we should not forget that Gluster is free (and doesn't
costs millions like some arrays).
>> The only mitigation is to thoroughly test each patch on a cluster that
provides storage for your dev/test clients.
>>
>> I hope you didn't  understand me wrong - just lower your
expectations -> even arrays for millions break , so Gluster is not an
exclusion , but at least it's OpenSource and free.
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> ________
>>
>> Community Meeting Calendar:
>>
>> APAC Schedule -
>> Every 2nd and 4th Tuesday at 11:30 AM IST
>> Bridge: https://bluejeans.com/441850968
>>
>> NA/EMEA Schedule -
>> Every 1st and 3rd Tuesday at 01:00 PM EDT
>> Bridge: https://bluejeans.com/441850968
>>
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
> --
> Regards,
> Hari Gowtham.
> ________
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/441850968
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

Diego Remolina

2020-Feb-12 12:16 UTC

head link

[Gluster-users] GlusterFS problems & alternatives

I can echo the sentiment, lived with a horrible memory leak on FUSE mounted
glusterfs all the way from some 3.X version (I think it went bad on 3.7,
but cannot remember) through 4.x on a period of 2+ years, where the only
solution was to reboot the servers every 15 to 30 days to clear the memory
(servers have 64GB of RAM). After the latest upgrades of version 4.x, the
reboot had to be done every 15 days. Finally on 6.6 the issue seems fixed
(knock on wood).

# free
             total        used        free      shared  buff/cache
  available
Mem:       65771848    16740376     7500048      755468    41531424
   47707444
Swap:             0           0           0
[root at hostname ~]# uptime
07:05:49 up 31 days,  8:47,  1 user,  load average: 0.40, 0.36, 0.33

If I had been running 4.x series on the above example, gluster would had
already eaten 40GB-50GB of RAM and the server would probably have about 5GB
free at best, if it hadn't already started miss-behaving after 2 weeks.

I have had success running MooseFS Pro, I am using the "unreleased"
4.x
branch, which we got for use in an Educational institution. Have not had
any issues with it. If it wasn't for the elevated cost, I would had already
replaced every other instance of glusterfs I run with MooseFS.

My takeaway is that things get eventually fixed, but I have suffered
through a lot using glusterfs and Samba. There have been many different
combinations of glusterfs and the samba vfs gluster plugin that have stolen
my peaceful sleep. To this date, I have never gone back to using the samba
vfs gluster plugin after a catastrophe where it broke on a samba upgrade
and then nobody could write files back to the servers via samba. That is
when I changed over to FUSE mounts and then lived with the memory leak
problems for so long.

Diego



On Wed, Feb 12, 2020 at 2:23 AM Hari Gowtham <hgowtham at redhat.com>
wrote:
> Hi,
>
> Stefan, sorry to hear that things are breaking a lot in your cluster,
> please file a bug(s) with the necessary information so we can take a look.
> If already filed, share it here so we are reminded of it. Fixing broken
> cluster state should be easy with Gluster.
> There are a few older threads you should be able to find regarding the
> same.
>
> Do consider the facts that the devs are limited in bandwidth. we do look
> at the issues and are fixing them actively.
> We may take some time expecting the community to help each other as well.
> If they couldn't resolve it we get in try to sort it out.
> FYI: You can see dozens of bugs being worked on even in the past 2 days:
> https://review.gluster.org/#/q/status:open+project:glusterfs
> And there are other activities happening around as well to make gluster
> project healthier. Like Glusto. We are working on this testing framework
> to cover as many cases as possible. If you can send out a test case, it
> will be beneficial for you as well as the community.
>
> We don't see many people sending out mails that their cluster is
healthy
> and they are happy (not sure if they think they are spamming.
> which they won't be. It helps us understand how well things are going).
> Thanks Erik and Strahi, for sharing your experience. It means a lot to us
> :)
> People usually prefer to send a mail when something breaks and that's
one
> main reason all the threads you read are creating negativity.
>
> Do let us know what is the issue and we will try our best to help you out.
>
> Regards,
> Hari.
>
>
>
> On Wed, Feb 12, 2020 at 11:58 AM Strahil Nikolov <hunter86_bg at
yahoo.com>
> wrote:
>
>> On February 12, 2020 12:28:14 AM GMT+02:00, Erik Jacobson <
>> erik.jacobson at hpe.com> wrote:
>> >> looking through the last couple of week on this mailing list
and
>> >reflecting our own experiences, I have to ask: what is the status
of
>> >GlusterFS? So many people here reporting bugs and no solutions are
in
>> >sight. GlusterFS clusters break left and right, reboots of a node
have
>> >become a warrant for instability and broken clusters, no way to fix
>> >broken clusters. And all of that with recommended settings, and in
our
>> >case, enterprise hardware underneath.
>> >
>> >
>> >I have been one of the people asking questions. I sometimes get an
>> >answer, which I appreciate. Other times not. But I'm not paying
for
>> >support in this forum so I appreciate what I can get. My questions
>> >are sometimes very hard to summarize and I can't say I've
been offering
>> >help as much as I ask. I think I will try to do better.
>> >
>> >
>> >Just to counter with something cool....
>> >As we speak now, I'm working on a 2,000 node cluster that will
soon be
>> >a
>> >5120 node cluster. We're validating it with the newest version
of our
>> >cluster manager.
>> >
>> >It has 12 leader nodes (soon to have 24) that are gluster servers
and
>> >gnfs servers.
>> >
>> >I am validating Gluster7.2 (updating from 4.6). Things are looking
very
>> >good. 5120 nodes using RO NFS root with RW NFS overmounts (for
things
>> >like /var, /etc, ...)...
>> >- boot 1 (where each node creates a RW XFS image on top of NFS for
its
>> >  writable area then syncs /var, /etc, etc) -- full boot is 15-16
>> >  minutes for 2007 nodes.
>> >- boot 2 (where the writable area pre-exists and is reused, just
>> >  re-rsynced) -- 8-9 minutes to boot 2007 nodes.
>> >
>> >This is similar to gluster 4, but I think it's saying something
to not
>> >have had any failures in this setup on the bleeding edge release
level.
>> >
>> >We also use a different volume shared between the leaders and the
head
>> >node for shared-storage consoles and system logs. It's working
great.
>> >
>> >I haven't had time to test other solutions. Our old solution
from SGI
>> >days (ICE, ICE X, etc) was a different model where each leader
served
>> >a set of nodes and NFS-booted 288 or so. No shared storage.
>> >
>> >Like you, I've wondered if something else matches this
solution. We
>> >like
>> >the shared storage and the ability for a leader to drop and not
take
>> >288 noes with it.
>> >
>> >(All nodes running RHEL8.0, Glusterfs 72, CTDB 4.9.1)
>> >
>> >
>> >
>> >So we can say gluster is providing the network boot solution for
now
>> >two
>> >supercomputers.
>> >
>> >
>> >
>> >Erik
>> >________
>> >
>> >Community Meeting Calendar:
>> >
>> >APAC Schedule -
>> >Every 2nd and 4th Tuesday at 11:30 AM IST
>> >Bridge: https://bluejeans.com/441850968
>> >
>> >NA/EMEA Schedule -
>> >Every 1st and 3rd Tuesday at 01:00 PM EDT
>> >Bridge: https://bluejeans.com/441850968
>> >
>> >Gluster-users mailing list
>> >Gluster-users at gluster.org
>> >https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>> Hi Stefan,
>>
>> It seems that devs are not so active in the mailing lists, but based on
>> my experience the bugs will be fixed in a reasonable timeframe. I admit
>> that I was quite frustrated when my Gluster v6.5  to v6.6 upgrade made
my
>> lab useless for 2 weeks  and the only help came from oVirt Dev, while
>> gluster-users/devel were semi-silent.
>> Yet, I'm not paying for any support and I know that any help here
is just
>> a good will.
>> I hope this has nothing in common with the recent acquisition from IBM,
>> but we will see.
>>
>>
>> There is a reason why Red Hat clients are still using Gluster v3 (even
>> with backports) - it is the most tested version in Gluster.
>> For me Gluster v4+ compared  to v3 is like  Fedora  to RHEL. After all,
>> the upstream is not so well tested and Gluster community is taking over
>> here - reporting bugs, sharing workarounds, giving advices .
>>
>> Of course, if you need rock-solid Gluster environment - you definately
>> need the enterprise solution with it's 24/7 support.
>>
>> Keep in mind that even the most expensive storage arrays break after an
>> upgrade (it happened 3 times for less than 2 weeks where 2k+ machines
were
>> read-only,  before the vendor provided a new patch), so the issues in
>> Gluster are nothing new  and we should not forget that Gluster is free
(and
>> doesn't costs millions like some arrays).
>> The only mitigation is to thoroughly test each patch on a cluster that
>> provides storage for your dev/test clients.
>>
>> I hope you didn't  understand me wrong - just lower your
expectations ->
>> even arrays for millions break , so Gluster is not an exclusion , but
at
>> least it's OpenSource and free.
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> ________
>>
>> Community Meeting Calendar:
>>
>> APAC Schedule -
>> Every 2nd and 4th Tuesday at 11:30 AM IST
>> Bridge: https://bluejeans.com/441850968
>>
>> NA/EMEA Schedule -
>> Every 1st and 3rd Tuesday at 01:00 PM EDT
>> Bridge: https://bluejeans.com/441850968
>>
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>
> --
> Regards,
> Hari Gowtham.
> ________
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/441850968
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20200212/ce424211/attachment.html>

Gluster users - Feb 2020 - GlusterFS problems & alternatives

[Gluster-users] GlusterFS problems & alternatives

[Gluster-users] GlusterFS problems & alternatives

[Gluster-users] GlusterFS problems & alternatives