?? 21 ??? 2020 ?. 10:53:10 GMT+03:00, Gionatan Danti <g.danti at assyoma.it> ??????:>Il 2020-06-21 01:26 Strahil Nikolov ha scritto: >> The efforts are far less than reconstructing the disk of a VM from >> CEPH. In gluster , just run a find on the brick searching for the >> name of the VM disk and you will find the VM_IMAGE.xyz (where xyz is >> just a number) and then concatenate the list into a single file. > >Sure, but it is somewhat impractical with a 6 TB fileserver image and >500 users screaming for their files ;) >And I fully expect to be the reconstruction much easier than Ceph but, >from what I read, Ceph is less likely to broke in the first place. But >I >admit I never seriously run a Ceph cluster, so maybe it is more fragile > >than I expect.With every community project , you are in the position of a Betta Tester - no matter Fedora, Gluster or CEPH. So far , I had issues with upstream projects only diring and immediately after patching - but this is properly mitigated with a reasonable patching strategy (patch test environment and several months later patch prod with the same repos). Enterprise Linux breaks (and alot) having 10-times more users and use cases, so you cannot expect to start to use Gluster and assume that a free peoject won't break at all. Our part in this project is to help the devs to create a test case for our workload , so regressions will be reduced to minimum. In the past 2 years, we got 2 major issues with VMware VSAN and 1 major issue with a Enterprise Storage cluster (both solutions are quite expensive) - so I always recommend proper testing of your software .>> That's true, but you could also use NFS Ganesha, which is >> more performant than FUSE and also as reliable as it. > >From this very list I read about many users with various problems when >using NFS Ganesha. Is that a wrong impression?From my observations, almost nobody is complaining about Ganesha in the mailing list -> 50% are having issues with geo replication,20% are having issues with small file performance and the rest have issues with very old version of gluster -> v5 or older.>> It's not so hard to do it - just use either 'reset-brick' or >> 'replace-brick' . > >Sure - the command itself is simple enough. The point it that each >reconstruction is quite more "riskier" than a simple RAID >reconstruction. Do you run a full Gluster SDS, skipping RAID? How do >you >found this setup?I can't say that a replace-brick on a 'replica 3' volume is more riskier than a rebuild of a raid, but I have noticed that nobody is following Red Hat's guide to use either: - a Raid6 of 12 Disks (2-3 TB big) - a Raid10 of 12 Disks (2-3 TB big) - JBOD disks in 'replica 3' mode (i'm not sure about the size RH recommends, most probably 2-3 TB) So far, I didn' have the opportunity to run on JBODs.>Thanks.
I agree with this assessment for the most part. I'll just add that, during development of Gluster based solutions, we had internal use of Redhat Gluster. This was over a year and a half ago when we started. For my perhaps non-mainstream use cases, I found the latest versions of gluster 7 actually fixed several of my issues. Now, I did not try to work with RedHat when I hit problems as it was only "non-shipable support" - we could install it but not deliver it. Since it didn't work well for our strange use cases, we moved on to building our own Gluster instead instead of working to have customers buy the Red Hat one. (We also support sles12, sles15, rhel7, rhel8 - so having Red Hat's version of Gluster sort of wouldn't have worked out for us anyway). However, I also found that it is quite easy for my use case to hit new bugs. When we go from gluster72 to one of the newer ones, little things might happen (and did happen). I don't complain because I get free support from you and I do my best to fix them if I have time and access to a failing system. A tricky thing in my world is we will sell a cluster with 5,000 nodes to boot and my test cluster may have 3 nodes. I can get time up to 128 nodes on one test system. But I only get short-term access to bigger systems at the factory. So being able to change from one Gluster version to another is a real challenge for us because there simply is no way for us to test very often and, like is normal in HPC, problems only show at scale. hahaa :) :) This is also why we are still using Gluster NFS. We know we need to work with the community on fixing some Ganesha issues, but the amount of time we get on a large machine that exhibits the problem is short and we must prioritize. This is why I'm careful to never "blame Ganesha" but rather point out that we haven't had time to track the issues down with the Ganesha community. Meanwhile we hope we can keep building Gluster NFS :) When I next do a version-change of Gluster or try Ganesha again, it will be when I have sustained access to at least a 1024 node cluster to boot with 3 or 6 Gluster servers to really work out any issues. I consider this "a cost of doing business in the world I work in" but it is a real challenge indeed. I assume some challenges parallel Gluster developers.... "Works fine on my limited hardware or virtual machines". Erik> With every community project , you are in the position of a Betta Tester - no matter Fedora, Gluster or CEPH. So far , I had issues with upstream projects only diring and immediately after patching - but this is properly mitigated with a reasonable patching strategy (patch test environment and several months later patch prod with the same repos). > Enterprise Linux breaks (and alot) having 10-times more users and use cases, so you cannot expect to start to use Gluster and assume that a free peoject won't break at all. > Our part in this project is to help the devs to create a test case for our workload , so regressions will be reduced to minimum. > > In the past 2 years, we got 2 major issues with VMware VSAN and 1 major issue with a Enterprise Storage cluster (both solutions are quite expensive) - so I always recommend proper testing of your software . > > > >> That's true, but you could also use NFS Ganesha, which is > >> more performant than FUSE and also as reliable as it. > > > >From this very list I read about many users with various problems when > >using NFS Ganesha. Is that a wrong impression? > > >From my observations, almost nobody is complaining about Ganesha in the mailing list -> 50% are having issues with geo replication,20% are having issues with small file performance and the rest have issues with very old version of gluster -> v5 or older. > > >> It's not so hard to do it - just use either 'reset-brick' or > >> 'replace-brick' . > > > >Sure - the command itself is simple enough. The point it that each > >reconstruction is quite more "riskier" than a simple RAID > >reconstruction. Do you run a full Gluster SDS, skipping RAID? How do > >you > >found this setup? > > I can't say that a replace-brick on a 'replica 3' volume is more riskier than a rebuild of a raid, but I have noticed that nobody is following Red Hat's guide to use either: > - a Raid6 of 12 Disks (2-3 TB big) > - a Raid10 of 12 Disks (2-3 TB big) > - JBOD disks in 'replica 3' mode (i'm not sure about the size RH recommends, most probably 2-3 TB) > So far, I didn' have the opportunity to run on JBODs. > > > >Thanks. > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users
Il 2020-06-21 14:20 Strahil Nikolov ha scritto:> With every community project , you are in the position of a Betta > Tester - no matter Fedora, Gluster or CEPH. So far , I had > issues with upstream projects only diring and immediately after > patching - but this is properly mitigated with a reasonable > patching strategy (patch test environment and several months later > patch prod with the same repos). > Enterprise Linux breaks (and alot) having 10-times more users and > use cases, so you cannot expect to start to use Gluster and assume > that a free peoject won't break at all. > Our part in this project is to help the devs to create a test case for > our workload , so regressions will be reduced to minimum.Well, this is true, and both devs & community deserve a big thanks for all the work done.> In the past 2 years, we got 2 major issues with VMware VSAN and 1 > major issue with a Enterprise Storage cluster (both solutions are > quite expensive) - so I always recommend proper testing of your > software .Interesting, I am almost tempted to ask you what issue you had with vSAN, but this is not the right mailing list ;)> From my observations, almost nobody is complaining about Ganesha in > the mailing list -> 50% are having issues with geo replication,20% > are having issues with small file performance and the rest have > issues with very old version of gluster -> v5 or older.Mmm, I would swear to have read quite a few posts where the problem was solved by migrating away from NFS Ganesha. Still, for hyperconverged setup a problem remains: NFS on loopback/localhost is not 100% supported (or, at least, RH is not willing to declare it supportable/production ready [1]). A fuse mount would be the more natural way to access the underlying data.> I can't say that a replace-brick on a 'replica 3' volume is more > riskier than a rebuild of a raid, but I have noticed that nobody is > following Red Hat's guide to use either: > - a Raid6 of 12 Disks (2-3 TB big) > - a Raid10 of 12 Disks (2-3 TB big) > - JBOD disks in 'replica 3' mode (i'm not sure about the size RH > recommends, most probably 2-3 TB) > So far, I didn' have the opportunity to run on JBODs.For the RAID6/10 setup, I found no issues: simply replace the broken disk without involing Gluster at all. However, this also means facing the "iops wall" I described earlier for single-brick node. Going full-Guster with JBODs would be interesting from a performance standpoint, but this complicate eventual recovery from bad disks. Does someone use Gluster in JBOD mode? If so, can you share your experience? Thanks. [1] https://access.redhat.com/solutions/22231 (accound required) [2] https://bugzilla.redhat.com/show_bug.cgi?id=489889 (old, but I can not find anything newer) -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it [1] email: g.danti at assyoma.it - info at assyoma.it GPG public key ID: FF5F32A8