Strahil Nikolov
2023-Mar-21 00:27 UTC
[Gluster-users] hardware issues and new server advice
Generally,the recommended approach is to have? 4TB disks and no more than 10-12 per HW RAID.Of course , it's not always possible but a resync of a failed 14 TB drive will take eons. I'm not sure if the Ryzens can support ECC memory, but if they do - go for it. In both scenarios, always align the upper layers (LVM , FS) with the stripe width and stripe size. What kind of workload do you have ? Best Regards,Strahil Nikolov? On Sat, Mar 18, 2023 at 14:36, Martin B?hr<mbaehr+gluster at realss.com> wrote: hi, our current servers are suffering from a weird hardware issue that forces us to start over. in short we have two servers with 15 disks at 6TB each, divided into three raid5 arrays for three bricks per server at 22TB per brick. each brick on one server is replicated to a brick on the second server. the hardware issue is that somewhere in the backplane random I/O errors happen when the system is under load. these cause the raid to fail disks, although the disks themselves are perfectly fine. reintegration of the disks causes more load and is therefore difficult. we have been running these servers for at least four years, and the problem only started appearing about three months ago our hostingprovider acknowledged the issue but does not support moving the disks to different servers. (they replaced the hardware but that didn't help) so we need to start over. my first intuition was that we should have smaller servers with less disks to avoid repeating the above scenario. we also previously had issues with the load created by raid resync so we are considering to skip raid alltogether and rely on gluster replication instead. (by compensating with three replicas per brick instead of two) our options are: 6 of these: AMD Ryzen 5 Pro 3600 - 6c/12t - 3.6GHz/4.2GHz 32GB - 128GB RAM 4 or 6 ? 6TB HDD SATA 6Gbit/s or three of these: AMD Ryzen 7 Pro 3700 - 8c/16t - 3.6GHz/4.4GHz 32GB - 128GB RAM 6? 14TB HDD SAS 6Gbit/s i would configure 5 bricks on each server (leaving one disk as a hot spare) the engineers prefer the second option due to the architecture and SAS disks. it is also cheaper. i am concerned that 14TB disks will take to long to heal if one ever has to be replaced and would favor the smaller disks. the other question is, is skipping raid a good idea? greetings, martin. ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20230321/a2aaf96c/attachment.html>
Excerpts from Strahil Nikolov's message of 2023-03-21 00:27:58 +0000:> Generally,the recommended approach is to have? 4TB disks and no more > than 10-12 per HW RAID.what kind of raid configuration and brick size do you recommend here?> Of course , it's not always possible but a > resync of a failed 14 TB drive will take eons.right, that is my concern too. but with raid you tend to get even larger bricks. i have the impression that a full brick replacement is to be avoided. on the other hand i saw recommendations to reset a brick when the filesystem is damaged. isn't that the equivalent of a full brick replacement?> What kind of workload do you have ?the primary data is photos. we get an average of 50000 new files per day, with a peak if 7 to 8 times as much during christmas. gluster has always been able to keep up with that, only when raid resync or checks happen the server load sometimes increases to cause issues. that is, our primary failure point is hardware. years ago we had problems with bad disks, now it is the backplane, causing us to rebuild gluster from scratch now for the third time. i was really hoping to build a system that just grows and doesn't force us to move the whole (continuously growing) dataset to new servers. how should we build servers to actually last? greetings, martin.