Erik Jacobson
2021-Mar-22 15:54 UTC
[Gluster-users] Gluster usage scenarios in HPC cluster management
> > The stuff I work on doesn't use containers much (unlike a different > > system also at HPE). > By "pods" I meant "glusterd instance", a server hosting a collection of > bricks.Oh ok. The term is overloaded in my world.> > I don't have a recipe, they've just always been beefy enough for > > gluster. Sorry I don't have a more scientific answer. > Seems that 64GB RAM are not enough for a pod with 26 glusterfsd > instances and no other services (except sshd for management). What do > you mean by "beefy enough"? 128GB RAM or 1TB?We are currently using replica-3 but may also support replica-5 in the future. So if you had 24 leaders like HLRS, there would be 8 replica-3 at the bottom layer, and then distributed across. (replicated/distributed volumes) So we would have 24 leader nodes, each leader would have a disk serving 4 bricks (one of which is simply a lock FS for CTDB, one is sharded, one is for logs, and one is heavily optimized for non-object expanded tree NFS). The term "disk" is loose. So each SU Leader (or gluster server) serving the 4 volumes, 8x3 configuration, in our world has some differences in CPU type and memory and storage depending on order and preferences and timing (things always move forward). On an SU Leader, we typically do 2 RAID10 volumes with a RAID controller including cache. However, we have moved to RAID1 in some cases with better disks. Leaders store a lot of non-gluster stuff on "root" and then gluster has a dedicated disk/LUN. We have been trying to improve our helper tools to 100% wheel out a bad leader (say it melted in to the floor) and replace it. Once we have that solid, and because our monitoring data on the "root" drive is already redundant, we plan to move newer servers to two NVME drives without RAID. One for gluster and one for OS. If a leader melts in to the floor, we have a procedure to discover a new node for that, install the base OS including gluster/CTDB/etc, and then run a tool to re-integrate it in to the cluster as an SU Leader node again and do the healing. Separately, monitoring data outside of gluster will heal. PS: I will note that I have a mini-SU-leader cluster on my desktop (qemu/ libvirt) for development. It is a 1x3 set of SU Leaders, one head node, and one compute node. I make an adjustment to reduce the gluster cache to fit in the memory space. Works fine. Not real fast but good enough for development. Specs of a leader node at a customer site: * 256G RAM * Storage: - MR9361-8i controller - 7681GB root LUN (RAID1) - 15.4 TB for gluster bricks (RAID10) - 6 SATA SSD MZ7LH7T6HMLA-00005 * AMD EPYC 7702 64-Core Processor - CPU(s): 128 - On-line CPU(s) list: 0-127 - Thread(s) per core: 2 - Core(s) per socket: 64 - Socket(s): 1 - NUMA node(s): 4 * Management Ethernet - Gluster and cluster management co-mingled - 2x40G (but 2x10G wouold be fine)
Diego Zuccato
2021-Mar-23 08:01 UTC
[Gluster-users] Gluster usage scenarios in HPC cluster management
Il 22/03/21 16:54, Erik Jacobson ha scritto:> So if you had 24 leaders like HLRS, there would be 8 replica-3 at the > bottom layer, and then distributed across. (replicated/distributed > volumes)I still have to grasp the "leader node" concept. Weren't gluster nodes "peers"? Or by "leader" you mean that it's mentioned in the fstab entry like /l1,l2,l3:gv0 /mnt/gv0 glusterfs defaults 0 0 while the peer list includes l1,l2,l3 and a bunch of other nodes?> So we would have 24 leader nodes, each leader would have a disk serving > 4 bricks (one of which is simply a lock FS for CTDB, one is sharded, > one is for logs, and one is heavily optimized for non-object expanded > tree NFS). The term "disk" is loose.That's a system way bigger than ours (3 nodes, replica3arbiter1, up to 36 bricks per node).> Specs of a leader node at a customer site: > * 256G RAMGlip! 256G for 4 bricks... No wonder I have had troubles running 26 bricks in 64GB RAM... :) -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Universit? di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786