thr3ads.net - Gluster users - [Gluster-users] State of Gluster project [Jun 2020]

If this information is useful, please help other people find it:
Share via:

Gionatan Danti

2020-Jun-21 17:43 UTC

[Gluster-users] State of Gluster project

Il 2020-06-21 14:20 Strahil Nikolov ha scritto:> With  every community project ,  you are in the position  of a Betta
> Tester  - no matter Fedora,  Gluster  or CEPH. So far  ,  I had
> issues with upstream  projects only diring and immediately after
> patching  - but this is properly mitigated  with a  reasonable
> patching strategy (patch  test environment and several months later
> patch prod with the same repos).
> Enterprise  Linux breaks (and alot) having 10-times more  users and
> use  cases,  so you cannot expect to start to use  Gluster  and assume
> that a  free  peoject won't break at all.
> Our part in this project is to help the devs to create a test case for
> our workload ,  so  regressions will be reduced to minimum.
Well, this is true, and both devs & community deserve a big thanks for 
all the work done.
> In the past 2  years,  we  got 2  major  issues with VMware VSAN and 1
>  major  issue  with  a Enterprise Storage cluster (both solutions are
> quite  expensive)  - so  I always recommend proper  testing  of your
> software .
Interesting, I am almost tempted to ask you what issue you had with 
vSAN, but this is not the right mailing list ;)
> From my observations,  almost nobody  is complaining about Ganesha in
> the mailing list -> 50% are  having issues  with geo replication,20%
> are  having issues with small file performance and the rest have
> issues with very old version of gluster  -> v5 or older.
Mmm, I would swear to have read quite a few posts where the problem was 
solved by migrating away from NFS Ganesha. Still, for hyperconverged 
setup a problem remains: NFS on loopback/localhost is not 100% supported 
(or, at least, RH is not willing to declare it supportable/production 
ready [1]). A fuse mount would be the more natural way to access the 
underlying data.
> I  can't say that a  replace-brick  on a 'replica  3' volume is
more
> riskier  than a rebuild  of a raid,  but I have noticed that nobody is
>  following Red Hat's  guide  to use  either:
> -  a  Raid6  of 12  Disks (2-3  TB  big)
> -  a Raid10  of  12  Disks (2-3  TB big)
> -  JBOD disks in 'replica  3' mode (i'm not sure about the size
RH
> recommends,  most probably 2-3 TB)
>  So far,  I didn' have the opportunity to run on JBODs.
For the RAID6/10 setup, I found no issues: simply replace the broken 
disk without involing Gluster at all. However, this also means facing 
the "iops wall" I described earlier for single-brick node. Going 
full-Guster with JBODs would be interesting from a performance 
standpoint, but this complicate eventual recovery from bad disks.

Does someone use Gluster in JBOD mode? If so, can you share your 
experience?
Thanks.

[1] https://access.redhat.com/solutions/22231 (accound required)
[2] https://bugzilla.redhat.com/show_bug.cgi?id=489889 (old, but I can 
not find anything newer)

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it [1]
email: g.danti at assyoma.it - info at assyoma.it
GPG public key ID: FF5F32A8

Mahdi Adnan

2020-Jun-21 18:39 UTC

head link

[Gluster-users] State of Gluster project

Hello Gionatan,

 Using Gluster brick in a RAID configuration might be safer and require
less work from Gluster admins but, it is a waste of disk space.
Gluster bricks are replicated "assuming you're creating a
distributed-replica volume" so when brick went down, it should be easy to
recover it and should not affect the client's IO.
We are using JBOD in all of our Gluster setups, overall, performance is
good, and replacing a brick would work "most" of the time without
issues.

On Sun, Jun 21, 2020 at 8:43 PM Gionatan Danti <g.danti at assyoma.it>
wrote:
> Il 2020-06-21 14:20 Strahil Nikolov ha scritto:
> > With  every community project ,  you are in the position  of a Betta
> > Tester  - no matter Fedora,  Gluster  or CEPH. So far  ,  I had
> > issues with upstream  projects only diring and immediately after
> > patching  - but this is properly mitigated  with a  reasonable
> > patching strategy (patch  test environment and several months later
> > patch prod with the same repos).
> > Enterprise  Linux breaks (and alot) having 10-times more  users and
> > use  cases,  so you cannot expect to start to use  Gluster  and assume
> > that a  free  peoject won't break at all.
> > Our part in this project is to help the devs to create a test case for
> > our workload ,  so  regressions will be reduced to minimum.
>
> Well, this is true, and both devs & community deserve a big thanks for
> all the work done.
>
> > In the past 2  years,  we  got 2  major  issues with VMware VSAN and 1
> >  major  issue  with  a Enterprise Storage cluster (both solutions are
> > quite  expensive)  - so  I always recommend proper  testing  of your
> > software .
>
> Interesting, I am almost tempted to ask you what issue you had with
> vSAN, but this is not the right mailing list ;)
>
> > From my observations,  almost nobody  is complaining about Ganesha in
> > the mailing list -> 50% are  having issues  with geo
replication,20%
> > are  having issues with small file performance and the rest have
> > issues with very old version of gluster  -> v5 or older.
>
> Mmm, I would swear to have read quite a few posts where the problem was
> solved by migrating away from NFS Ganesha. Still, for hyperconverged
> setup a problem remains: NFS on loopback/localhost is not 100% supported
> (or, at least, RH is not willing to declare it supportable/production
> ready [1]). A fuse mount would be the more natural way to access the
> underlying data.
>
> > I  can't say that a  replace-brick  on a 'replica  3'
volume is more
> > riskier  than a rebuild  of a raid,  but I have noticed that nobody is
> >  following Red Hat's  guide  to use  either:
> > -  a  Raid6  of 12  Disks (2-3  TB  big)
> > -  a Raid10  of  12  Disks (2-3  TB big)
> > -  JBOD disks in 'replica  3' mode (i'm not sure about the
size  RH
> > recommends,  most probably 2-3 TB)
> >  So far,  I didn' have the opportunity to run on JBODs.
>
> For the RAID6/10 setup, I found no issues: simply replace the broken
> disk without involing Gluster at all. However, this also means facing
> the "iops wall" I described earlier for single-brick node. Going
> full-Guster with JBODs would be interesting from a performance
> standpoint, but this complicate eventual recovery from bad disks.
>
> Does someone use Gluster in JBOD mode? If so, can you share your
> experience?
> Thanks.
>
> [1] https://access.redhat.com/solutions/22231 (accound required)
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=489889 (old, but I can
> not find anything newer)
>
> --
> Danti Gionatan
> Supporto Tecnico
> Assyoma S.r.l. - www.assyoma.it [1]
> email: g.danti at assyoma.it - info at assyoma.it
> GPG public key ID: FF5F32A8
>

-- 


[image: photograph]

Mahdi Adnan
IT Manager

Information Technology

EarthLink

VoIP: 69
Cell: 07903316180

Website: www.earthlink.iq

--
This message has been scanned for viruses and
dangerous content and is believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20200621/316b44a0/attachment.html>

Mahdi Adnan

2020-Jun-21 18:41 UTC

head link

[Gluster-users] State of Gluster project

Hello Gionatan,

 Using Gluster brick in a RAID configuration might be safer and require
less work from Gluster admins but, it is a waste of disk space.
Gluster bricks are replicated "assuming you're creating a
distributed-replica volume" so when brick went down, it should be easy to
recover it and should not affect the client's IO.
We are using JBOD in all of our Gluster setups, overall, performance is
good, and replacing a brick would work "most" of the time without
issues.

On Sun, Jun 21, 2020 at 8:43 PM Gionatan Danti <g.danti at assyoma.it>
wrote:
> Il 2020-06-21 14:20 Strahil Nikolov ha scritto:
> > With  every community project ,  you are in the position  of a Betta
> > Tester  - no matter Fedora,  Gluster  or CEPH. So far  ,  I had
> > issues with upstream  projects only diring and immediately after
> > patching  - but this is properly mitigated  with a  reasonable
> > patching strategy (patch  test environment and several months later
> > patch prod with the same repos).
> > Enterprise  Linux breaks (and alot) having 10-times more  users and
> > use  cases,  so you cannot expect to start to use  Gluster  and assume
> > that a  free  peoject won't break at all.
> > Our part in this project is to help the devs to create a test case for
> > our workload ,  so  regressions will be reduced to minimum.
>
> Well, this is true, and both devs & community deserve a big thanks for
> all the work done.
>
> > In the past 2  years,  we  got 2  major  issues with VMware VSAN and 1
> >  major  issue  with  a Enterprise Storage cluster (both solutions are
> > quite  expensive)  - so  I always recommend proper  testing  of your
> > software .
>
> Interesting, I am almost tempted to ask you what issue you had with
> vSAN, but this is not the right mailing list ;)
>
> > From my observations,  almost nobody  is complaining about Ganesha in
> > the mailing list -> 50% are  having issues  with geo
replication,20%
> > are  having issues with small file performance and the rest have
> > issues with very old version of gluster  -> v5 or older.
>
> Mmm, I would swear to have read quite a few posts where the problem was
> solved by migrating away from NFS Ganesha. Still, for hyperconverged
> setup a problem remains: NFS on loopback/localhost is not 100% supported
> (or, at least, RH is not willing to declare it supportable/production
> ready [1]). A fuse mount would be the more natural way to access the
> underlying data.
>
> > I  can't say that a  replace-brick  on a 'replica  3'
volume is more
> > riskier  than a rebuild  of a raid,  but I have noticed that nobody is
> >  following Red Hat's  guide  to use  either:
> > -  a  Raid6  of 12  Disks (2-3  TB  big)
> > -  a Raid10  of  12  Disks (2-3  TB big)
> > -  JBOD disks in 'replica  3' mode (i'm not sure about the
size  RH
> > recommends,  most probably 2-3 TB)
> >  So far,  I didn' have the opportunity to run on JBODs.
>
> For the RAID6/10 setup, I found no issues: simply replace the broken
> disk without involing Gluster at all. However, this also means facing
> the "iops wall" I described earlier for single-brick node. Going
> full-Guster with JBODs would be interesting from a performance
> standpoint, but this complicate eventual recovery from bad disks.
>
> Does someone use Gluster in JBOD mode? If so, can you share your
> experience?
> Thanks.
>
> [1] https://access.redhat.com/solutions/22231 (accound required)
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=489889 (old, but I can
> not find anything newer)
>
> --
> Danti Gionatan
> Supporto Tecnico
> Assyoma S.r.l. - www.assyoma.it [1]
> email: g.danti at assyoma.it - info at assyoma.it
> GPG public key ID: FF5F32A8
>

-- 
Respectfully
Mahdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20200621/c803acd3/attachment.html>

Hu Bert

2020-Jun-22 04:58 UTC

head link

[Gluster-users] State of Gluster project

Am So., 21. Juni 2020 um 19:43 Uhr schrieb Gionatan Danti <g.danti at
assyoma.it>:
> For the RAID6/10 setup, I found no issues: simply replace the broken
> disk without involing Gluster at all. However, this also means facing
> the "iops wall" I described earlier for single-brick node. Going
> full-Guster with JBODs would be interesting from a performance
> standpoint, but this complicate eventual recovery from bad disks.
>
> Does someone use Gluster in JBOD mode? If so, can you share your
> experience?
> Thanks.
Hi,
we once used gluster with disks in JBOD mode (3 servers, 4x10TB hdd
each, 4 x 3 = 12), and to make it short: in our special case it wasn't
that funny. Big HDDs, lots of small files, (highly) concurrent access
through our application. It was running quite fine, until a disk
failed. The reset-disk took ~30 (!) days, as you have gluster
copying/restoring the data and the normal application read/write.
After the first reset had finished, a couple of days later another
disk died, and the fun started again :-) Maybe a bad use case.

With this experience, the next setup was: splitting data into 2 chunks
(high I/O, low I/O), 3 servers with 2 raid10 (same type of disk), each
raid used as a brick, resulting in replicate 3: 1 x 3 = 3. Changing a
failed disk now results in a complete raid resync, but regarding I/O
this is far better than using reset-disk with a HDD only. Only the
regularly running raid check was a bit of a performance issue.

Latest setup (for the high I/O part) looks like this: 3 servers, 10
disks with 10TB each -> 5 raid1, forming a distribute replicate with 5
bricks, 5 x 3 = 15. No disk has failed so far (fingers crossed), but
if now a disk fails, gluster is still running with all bricks
available, and after changing the failed, there's one raid resync
running, affecting only 1/5 of the volume. In theory that should be
better ;-) The regularly running raid checks are no problem so far,
for 15 raid1 only 1 is running, none parallel.

disclaimer: JBOD may work better with SSDs/NVMes - untested ;-)


Best regards,
Hubert

Gluster users - Jun 2020 - State of Gluster project

[Gluster-users] State of Gluster project

[Gluster-users] State of Gluster project

[Gluster-users] State of Gluster project

[Gluster-users] State of Gluster project