Il 2020-06-21 14:20 Strahil Nikolov ha scritto:> With every community project , you are in the position of a Betta > Tester - no matter Fedora, Gluster or CEPH. So far , I had > issues with upstream projects only diring and immediately after > patching - but this is properly mitigated with a reasonable > patching strategy (patch test environment and several months later > patch prod with the same repos). > Enterprise Linux breaks (and alot) having 10-times more users and > use cases, so you cannot expect to start to use Gluster and assume > that a free peoject won't break at all. > Our part in this project is to help the devs to create a test case for > our workload , so regressions will be reduced to minimum.Well, this is true, and both devs & community deserve a big thanks for all the work done.> In the past 2 years, we got 2 major issues with VMware VSAN and 1 > major issue with a Enterprise Storage cluster (both solutions are > quite expensive) - so I always recommend proper testing of your > software .Interesting, I am almost tempted to ask you what issue you had with vSAN, but this is not the right mailing list ;)> From my observations, almost nobody is complaining about Ganesha in > the mailing list -> 50% are having issues with geo replication,20% > are having issues with small file performance and the rest have > issues with very old version of gluster -> v5 or older.Mmm, I would swear to have read quite a few posts where the problem was solved by migrating away from NFS Ganesha. Still, for hyperconverged setup a problem remains: NFS on loopback/localhost is not 100% supported (or, at least, RH is not willing to declare it supportable/production ready [1]). A fuse mount would be the more natural way to access the underlying data.> I can't say that a replace-brick on a 'replica 3' volume is more > riskier than a rebuild of a raid, but I have noticed that nobody is > following Red Hat's guide to use either: > - a Raid6 of 12 Disks (2-3 TB big) > - a Raid10 of 12 Disks (2-3 TB big) > - JBOD disks in 'replica 3' mode (i'm not sure about the size RH > recommends, most probably 2-3 TB) > So far, I didn' have the opportunity to run on JBODs.For the RAID6/10 setup, I found no issues: simply replace the broken disk without involing Gluster at all. However, this also means facing the "iops wall" I described earlier for single-brick node. Going full-Guster with JBODs would be interesting from a performance standpoint, but this complicate eventual recovery from bad disks. Does someone use Gluster in JBOD mode? If so, can you share your experience? Thanks. [1] https://access.redhat.com/solutions/22231 (accound required) [2] https://bugzilla.redhat.com/show_bug.cgi?id=489889 (old, but I can not find anything newer) -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it [1] email: g.danti at assyoma.it - info at assyoma.it GPG public key ID: FF5F32A8
Hello Gionatan, Using Gluster brick in a RAID configuration might be safer and require less work from Gluster admins but, it is a waste of disk space. Gluster bricks are replicated "assuming you're creating a distributed-replica volume" so when brick went down, it should be easy to recover it and should not affect the client's IO. We are using JBOD in all of our Gluster setups, overall, performance is good, and replacing a brick would work "most" of the time without issues. On Sun, Jun 21, 2020 at 8:43 PM Gionatan Danti <g.danti at assyoma.it> wrote:> Il 2020-06-21 14:20 Strahil Nikolov ha scritto: > > With every community project , you are in the position of a Betta > > Tester - no matter Fedora, Gluster or CEPH. So far , I had > > issues with upstream projects only diring and immediately after > > patching - but this is properly mitigated with a reasonable > > patching strategy (patch test environment and several months later > > patch prod with the same repos). > > Enterprise Linux breaks (and alot) having 10-times more users and > > use cases, so you cannot expect to start to use Gluster and assume > > that a free peoject won't break at all. > > Our part in this project is to help the devs to create a test case for > > our workload , so regressions will be reduced to minimum. > > Well, this is true, and both devs & community deserve a big thanks for > all the work done. > > > In the past 2 years, we got 2 major issues with VMware VSAN and 1 > > major issue with a Enterprise Storage cluster (both solutions are > > quite expensive) - so I always recommend proper testing of your > > software . > > Interesting, I am almost tempted to ask you what issue you had with > vSAN, but this is not the right mailing list ;) > > > From my observations, almost nobody is complaining about Ganesha in > > the mailing list -> 50% are having issues with geo replication,20% > > are having issues with small file performance and the rest have > > issues with very old version of gluster -> v5 or older. > > Mmm, I would swear to have read quite a few posts where the problem was > solved by migrating away from NFS Ganesha. Still, for hyperconverged > setup a problem remains: NFS on loopback/localhost is not 100% supported > (or, at least, RH is not willing to declare it supportable/production > ready [1]). A fuse mount would be the more natural way to access the > underlying data. > > > I can't say that a replace-brick on a 'replica 3' volume is more > > riskier than a rebuild of a raid, but I have noticed that nobody is > > following Red Hat's guide to use either: > > - a Raid6 of 12 Disks (2-3 TB big) > > - a Raid10 of 12 Disks (2-3 TB big) > > - JBOD disks in 'replica 3' mode (i'm not sure about the size RH > > recommends, most probably 2-3 TB) > > So far, I didn' have the opportunity to run on JBODs. > > For the RAID6/10 setup, I found no issues: simply replace the broken > disk without involing Gluster at all. However, this also means facing > the "iops wall" I described earlier for single-brick node. Going > full-Guster with JBODs would be interesting from a performance > standpoint, but this complicate eventual recovery from bad disks. > > Does someone use Gluster in JBOD mode? If so, can you share your > experience? > Thanks. > > [1] https://access.redhat.com/solutions/22231 (accound required) > [2] https://bugzilla.redhat.com/show_bug.cgi?id=489889 (old, but I can > not find anything newer) > > -- > Danti Gionatan > Supporto Tecnico > Assyoma S.r.l. - www.assyoma.it [1] > email: g.danti at assyoma.it - info at assyoma.it > GPG public key ID: FF5F32A8 >-- [image: photograph] Mahdi Adnan IT Manager Information Technology EarthLink VoIP: 69 Cell: 07903316180 Website: www.earthlink.iq -- This message has been scanned for viruses and dangerous content and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200621/316b44a0/attachment.html>
Hello Gionatan, Using Gluster brick in a RAID configuration might be safer and require less work from Gluster admins but, it is a waste of disk space. Gluster bricks are replicated "assuming you're creating a distributed-replica volume" so when brick went down, it should be easy to recover it and should not affect the client's IO. We are using JBOD in all of our Gluster setups, overall, performance is good, and replacing a brick would work "most" of the time without issues. On Sun, Jun 21, 2020 at 8:43 PM Gionatan Danti <g.danti at assyoma.it> wrote:> Il 2020-06-21 14:20 Strahil Nikolov ha scritto: > > With every community project , you are in the position of a Betta > > Tester - no matter Fedora, Gluster or CEPH. So far , I had > > issues with upstream projects only diring and immediately after > > patching - but this is properly mitigated with a reasonable > > patching strategy (patch test environment and several months later > > patch prod with the same repos). > > Enterprise Linux breaks (and alot) having 10-times more users and > > use cases, so you cannot expect to start to use Gluster and assume > > that a free peoject won't break at all. > > Our part in this project is to help the devs to create a test case for > > our workload , so regressions will be reduced to minimum. > > Well, this is true, and both devs & community deserve a big thanks for > all the work done. > > > In the past 2 years, we got 2 major issues with VMware VSAN and 1 > > major issue with a Enterprise Storage cluster (both solutions are > > quite expensive) - so I always recommend proper testing of your > > software . > > Interesting, I am almost tempted to ask you what issue you had with > vSAN, but this is not the right mailing list ;) > > > From my observations, almost nobody is complaining about Ganesha in > > the mailing list -> 50% are having issues with geo replication,20% > > are having issues with small file performance and the rest have > > issues with very old version of gluster -> v5 or older. > > Mmm, I would swear to have read quite a few posts where the problem was > solved by migrating away from NFS Ganesha. Still, for hyperconverged > setup a problem remains: NFS on loopback/localhost is not 100% supported > (or, at least, RH is not willing to declare it supportable/production > ready [1]). A fuse mount would be the more natural way to access the > underlying data. > > > I can't say that a replace-brick on a 'replica 3' volume is more > > riskier than a rebuild of a raid, but I have noticed that nobody is > > following Red Hat's guide to use either: > > - a Raid6 of 12 Disks (2-3 TB big) > > - a Raid10 of 12 Disks (2-3 TB big) > > - JBOD disks in 'replica 3' mode (i'm not sure about the size RH > > recommends, most probably 2-3 TB) > > So far, I didn' have the opportunity to run on JBODs. > > For the RAID6/10 setup, I found no issues: simply replace the broken > disk without involing Gluster at all. However, this also means facing > the "iops wall" I described earlier for single-brick node. Going > full-Guster with JBODs would be interesting from a performance > standpoint, but this complicate eventual recovery from bad disks. > > Does someone use Gluster in JBOD mode? If so, can you share your > experience? > Thanks. > > [1] https://access.redhat.com/solutions/22231 (accound required) > [2] https://bugzilla.redhat.com/show_bug.cgi?id=489889 (old, but I can > not find anything newer) > > -- > Danti Gionatan > Supporto Tecnico > Assyoma S.r.l. - www.assyoma.it [1] > email: g.danti at assyoma.it - info at assyoma.it > GPG public key ID: FF5F32A8 >-- Respectfully Mahdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200621/c803acd3/attachment.html>
Am So., 21. Juni 2020 um 19:43 Uhr schrieb Gionatan Danti <g.danti at assyoma.it>:> For the RAID6/10 setup, I found no issues: simply replace the broken > disk without involing Gluster at all. However, this also means facing > the "iops wall" I described earlier for single-brick node. Going > full-Guster with JBODs would be interesting from a performance > standpoint, but this complicate eventual recovery from bad disks. > > Does someone use Gluster in JBOD mode? If so, can you share your > experience? > Thanks.Hi, we once used gluster with disks in JBOD mode (3 servers, 4x10TB hdd each, 4 x 3 = 12), and to make it short: in our special case it wasn't that funny. Big HDDs, lots of small files, (highly) concurrent access through our application. It was running quite fine, until a disk failed. The reset-disk took ~30 (!) days, as you have gluster copying/restoring the data and the normal application read/write. After the first reset had finished, a couple of days later another disk died, and the fun started again :-) Maybe a bad use case. With this experience, the next setup was: splitting data into 2 chunks (high I/O, low I/O), 3 servers with 2 raid10 (same type of disk), each raid used as a brick, resulting in replicate 3: 1 x 3 = 3. Changing a failed disk now results in a complete raid resync, but regarding I/O this is far better than using reset-disk with a HDD only. Only the regularly running raid check was a bit of a performance issue. Latest setup (for the high I/O part) looks like this: 3 servers, 10 disks with 10TB each -> 5 raid1, forming a distribute replicate with 5 bricks, 5 x 3 = 15. No disk has failed so far (fingers crossed), but if now a disk fails, gluster is still running with all bricks available, and after changing the failed, there's one raid resync running, affecting only 1/5 of the volume. In theory that should be better ;-) The regularly running raid checks are no problem so far, for 15 raid1 only 1 is running, none parallel. disclaimer: JBOD may work better with SSDs/NVMes - untested ;-) Best regards, Hubert