Hello,
>4 cores is quite low, especially when healing.
The 4 cores (and, by default, 8GB RAM), is a standard offering in our
situations. It would be up to the specific usage our end-users to see of that
is enough (most deployed glusters in our environment have an average of 5% total
usage, so that does seem to be quite enough). Even this particular gluster
hardly even goes above 10/15% .. except when rebalancing after adding bricks
(then shoots to 80% during the several hours of rebalancing).
>Why not FUSE ? Ganesha is suitable for UNIX and BSD systems that do not
support FUSE.
When we designed our offering we did had a hard-time choosing the default..
Fuse vs NFS... Since we have (very) large environment, loads of
network-segments, layer-7 firewalls across subnets, and a variety of possible
clients (windows, aix, solaris, linux) we opted for NFS (via Ganesha). Each OS
can handle NFS and by sticking to NFSv4.0 with TCP it makes opening firewalls to
only TCP/2049 a lot simpler (else all of the different ports needed for brick
connections + glusterfsd itself need to be opened).
>Consider increasing the 'token' and 'consensus' to a more
meaningful values -> start with 10s token for example.
That is actually something we did not yet look at, thanks for the suggestion..
we'd would need to test this but would sound like a good recommendation
(currently they're at RedHat's defaults).
>For performance improvements , I would add some SSDs in the game (tier 1+
storage) and use the SSD-based LUNs as lvm caching.
As much as we'd like to, unfortunately not possible in our environment. We
use a 'private cloud' (which is not even a cloud, just a beefy vmware
environment), and each tenant/consumer gets the same type of resources.
Problems of a large (and often sluggish) financial company....
It currently hosts almost 20.000 VM instances in total (80% RHEL based) and
among that appr 55 gluster-clusters.
Customizing the corosync values to somewhat larger times does sound that it can
help in this case (less busy glusters seem to be able to cope well), thanks for
this suggestion!
Regards,
Nico
----- Oorspronkelijk bericht -----
Van: "Strahil Nikolov" <hunter86_bg at yahoo.com>
Aan: "gluster-users" <gluster-users at gluster.org>, "Nico
van Royen" <nico at van-royen.nl>
Verzonden: Maandag 19 oktober 2020 05:56:20
Onderwerp: Re: [Gluster-users] Setup recommendations
>Size is not that big, 600GB space with around half of that actually used.?
GlusterFS servers themselves each have 4 cores and 12GB memory.? It might also
be important to note that these are VMware hosted nodes that make use of? SAN
storage for the datastores.
4 cores is quite low, especially when healing.
>Connected to that NFS (ganesha) exported share are just over 100 clients,
all RHEL6 and RHEL7, some spanning 10 network hops away.? All of those clients
are (currently) using the same virtual-IP, so all end up on the same server.
Why not FUSE ? Ganesha is suitable for UNIX and BSD systems that do not support
FUSE.
>Note that I mentioned 'should', since at times it had anywhere
between 250.000 and 1 million files in it (which of course is not advised).?
Using some kind of hashing (subfolders spread per day/hour etc) was also already
advised.
If you have multiple subdomains (from replicate -> to distributed-replicated)
, you can also spread the load - yet 'find' won't be faster :)
Problems that are often seen:>- Any kind of operation on VMware such as a vMotion, creating a VM snapshot
etc. on the node that has these 100+ clients connected causes such a temporary
pause that pacemaker decides to switch the resources (causing a failover of the
virtual IP address, thus clients connected suffer delay).??
RH corosync defaults are not suitable for VMs. I prefer SUSE's defaults.
Consider increasing the 'token' and 'consensus' to a more
meaningful values -> start with 10s token for example.
>One would expect this to last just shy under a minute, then clients would
happily continue.? However connected clients are stuck with a non-working
mountpoint (commands as df, ls, find etc simply hang.. they go into an
uninterruptible sleep).
In regular HA NFS, there is a "notify" resource that notifies the
clients about the failover. The stale happens because your IP is brought before
the NFS export is ready. As you haven't provided HA details, I can't
help much there.
>Mount are 'hard' mounts to insure guaranteed writes.
That's good. Also is needed for the HA to properly work.
>- Once the number of files are over the 100.000 mark (again into a single,
unhashed, folder) any operation on that share becomes very sluggish (even a df,
on a client, would take 20/30 seconds,? a find command would take minutes to
complete).
I think it's expected...
>If anyone can spot any ideas for improvement ?
I would try to first switch to 'replica 3 arbiter 1' as current setup is
wasting storage, next switch the clients to FUSE.
For performance improvements , I would add some SSDs in the game (tier 1+
storage) and use the SSD-based LUNs as lvm caching.
Best Regards,
Strahil Nikolov