thr3ads.net - Gluster users - [Gluster-users] hardware issues and new server advice [Mar 2023]

If this information is useful, please help other people find it:
Share via:

Martin Bähr

2023-Mar-18 12:38 UTC

[Gluster-users] hardware issues and new server advice

hi,

our current servers are suffering from a weird hardware issue that
forces us to start over.

in short we have two servers with 15 disks at 6TB each, divided into
three raid5 arrays for three bricks per server at 22TB per brick.
each brick on one server is replicated to a brick on the second server.
the hardware issue is that somewhere in the backplane random I/O errors
happen when the system is under load. these cause the raid to fail
disks, although the disks themselves are perfectly fine. reintegration
of the disks causes more load and is therefore difficult.

we have been running these servers for at least four years, and the problem
only started appearing about three months ago
our hostingprovider acknowledged the issue but does not support moving
the disks to different servers. (they replaced the hardware but that
didn't help)

so we need to start over.

my first intuition was that we should have smaller servers with less
disks to avoid repeating the above scenario.

we also previously had issues with the load created by raid resync so we
are considering to skip raid alltogether and rely on gluster replication
instead. (by compensating with three replicas per brick instead of two)

our options are:

6 of these:
AMD Ryzen 5 Pro 3600 - 6c/12t - 3.6GHz/4.2GHz
32GB - 128GB RAM
4 or 6 ? 6TB HDD SATA 
6Gbit/s

or three of these:
AMD Ryzen 7 Pro 3700 - 8c/16t - 3.6GHz/4.4GHz
32GB - 128GB RAM
6? 14TB HDD SAS
6Gbit/s

i would configure 5 bricks on each server (leaving one disk as a hot
spare)

the engineers prefer the second option due to the architecture and SAS
disks. it is also cheaper.

i am concerned that 14TB disks will take to long to heal if one ever has
to be replaced and would favor the smaller disks.

the other question is, is skipping raid a good idea?

greetings, martin.

Strahil Nikolov

2023-Mar-21 00:27 UTC

head link

[Gluster-users] hardware issues and new server advice

Generally,the recommended approach is to have? 4TB disks and no more than 10-12
per HW RAID.Of course , it's not always possible but a resync of a failed 14
TB drive will take eons.
I'm not sure if the Ryzens can support ECC memory, but if they do - go for
it.
In both scenarios, always align the upper layers (LVM , FS) with the stripe
width and stripe size.
What kind of workload do you have ?
Best Regards,Strahil Nikolov?

On Sat, Mar 18, 2023 at 14:36, Martin B?hr<mbaehr+gluster at realss.com>
wrote:
hi,

our current servers are suffering from a weird hardware issue that
forces us to start over.

in short we have two servers with 15 disks at 6TB each, divided into
three raid5 arrays for three bricks per server at 22TB per brick.
each brick on one server is replicated to a brick on the second server.
the hardware issue is that somewhere in the backplane random I/O errors
happen when the system is under load. these cause the raid to fail
disks, although the disks themselves are perfectly fine. reintegration
of the disks causes more load and is therefore difficult.

we have been running these servers for at least four years, and the problem
only started appearing about three months ago
our hostingprovider acknowledged the issue but does not support moving
the disks to different servers. (they replaced the hardware but that
didn't help)

so we need to start over.

my first intuition was that we should have smaller servers with less
disks to avoid repeating the above scenario.

we also previously had issues with the load created by raid resync so we
are considering to skip raid alltogether and rely on gluster replication
instead. (by compensating with three replicas per brick instead of two)

our options are:

6 of these:
AMD Ryzen 5 Pro 3600 - 6c/12t - 3.6GHz/4.2GHz
32GB - 128GB RAM
4 or 6 ? 6TB HDD SATA
6Gbit/s

or three of these:
AMD Ryzen 7 Pro 3700 - 8c/16t - 3.6GHz/4.4GHz
32GB - 128GB RAM
6? 14TB HDD SAS
6Gbit/s

i would configure 5 bricks on each server (leaving one disk as a hot
spare)

the engineers prefer the second option due to the architecture and SAS
disks. it is also cheaper.

i am concerned that 14TB disks will take to long to heal if one ever has
to be replaced and would favor the smaller disks.

the other question is, is skipping raid a good idea?

greetings, martin.
________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users at gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20230321/a2aaf96c/attachment.html>

Maybe Matching Threads

Search for more apparently analagous threads

Gluster users - Mar 2023 - hardware issues and new server advice

[Gluster-users] hardware issues and new server advice

[Gluster-users] hardware issues and new server advice

Maybe Matching Threads