thr3ads.net - Gluster users - [Gluster-users] Gluster usage scenarios in HPC cluster management [Mar 2021]

If this information is useful, please help other people find it:
Share via:

Erik Jacobson

2021-Mar-22 15:54 UTC

[Gluster-users] Gluster usage scenarios in HPC cluster management

> > The stuff I work on doesn't use containers much (unlike a
different
> > system also at HPE).
> By "pods" I meant "glusterd instance", a server hosting
a collection of
> bricks.
Oh ok. The term is overloaded in my world.
> > I don't have a recipe, they've just always been beefy enough
for
> > gluster. Sorry I don't have a more scientific answer.
> Seems that 64GB RAM are not enough for a pod with 26 glusterfsd
> instances and no other services (except sshd for management). What do
> you mean by "beefy enough"? 128GB RAM or 1TB?
We are currently using replica-3 but may also support replica-5 in the
future.

So if you had 24 leaders like HLRS, there would be 8 replica-3 at the
bottom layer, and then distributed across. (replicated/distributed
volumes)

So we would have 24 leader nodes, each leader would have a disk serving
4 bricks (one of which is simply a lock FS for CTDB, one is sharded,
one is for logs, and one is heavily optimized for non-object expanded
tree NFS). The term "disk" is loose.

So each SU Leader (or gluster server) serving the 4 volumes, 8x3
configuration, in our world has some differences in CPU type and memory
and storage depending on order and preferences and timing (things always
move forward).

On an SU Leader, we typically do 2 RAID10 volumes with a RAID
controller including cache. However, we have moved to RAID1 in some cases with
better disks. Leaders store a lot of non-gluster stuff on "root" and
then gluster has a dedicated disk/LUN. We have been trying to improve
our helper tools to 100% wheel out a bad leader (say it melted in to the
floor) and replace it. Once we have that solid, and because our
monitoring data on the "root" drive is already redundant, we plan to
move newer servers to two NVME drives without RAID. One for gluster and
one for OS. If a leader melts in to the floor, we have a procedure to
discover a new node for that, install the base OS including
gluster/CTDB/etc, and then run a tool to re-integrate it in to the
cluster as an SU Leader node again and do the healing. Separately,
monitoring data outside of gluster will heal.

PS: I will note that I have a mini-SU-leader cluster on my desktop
(qemu/ libvirt) for development. It is a 1x3 set of SU Leaders, one head node,
and one compute node. I make an adjustment to reduce the gluster cache to fit
in the memory space. Works fine. Not real fast but good enough for development.


Specs of a leader node at a customer site:
 * 256G RAM
 * Storage: 
   - MR9361-8i controller
   - 7681GB root LUN (RAID1)
   - 15.4 TB for gluster bricks (RAID10)
   - 6 SATA SSD MZ7LH7T6HMLA-00005
 * AMD EPYC 7702 64-Core Processor
   - CPU(s):              128
   - On-line CPU(s) list: 0-127
   - Thread(s) per core:  2
   - Core(s) per socket:  64
   - Socket(s):           1
   - NUMA node(s):        4
 * Management Ethernet
   - Gluster and cluster management co-mingled
   - 2x40G (but 2x10G wouold be fine)

Diego Zuccato

2021-Mar-23 08:01 UTC

head link

[Gluster-users] Gluster usage scenarios in HPC cluster management

Il 22/03/21 16:54, Erik Jacobson ha scritto:
> So if you had 24 leaders like HLRS, there would be 8 replica-3 at the
> bottom layer, and then distributed across. (replicated/distributed
> volumes)I still have to grasp the "leader node" concept.
Weren't gluster nodes "peers"? Or by "leader" you mean
that it's
mentioned in the fstab entry like
/l1,l2,l3:gv0 /mnt/gv0 glusterfs defaults 0 0
while the peer list includes l1,l2,l3 and a bunch of other nodes?
> So we would have 24 leader nodes, each leader would have a disk serving
> 4 bricks (one of which is simply a lock FS for CTDB, one is sharded,
> one is for logs, and one is heavily optimized for non-object expanded
> tree NFS). The term "disk" is loose.That's a system way bigger than ours (3 nodes, replica3arbiter1, up to
36 bricks per node).
> Specs of a leader node at a customer site:
>  * 256G RAMGlip! 256G for 4 bricks... No wonder I have had troubles running 26
bricks in 64GB RAM... :)

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Universit? di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

Gluster users - Mar 2021 - Gluster usage scenarios in HPC cluster management

[Gluster-users] Gluster usage scenarios in HPC cluster management

[Gluster-users] Gluster usage scenarios in HPC cluster management