OOM is just just a matter of time.
Today mem use is up to 177G/187 and:
# ps aux|grep glfsheal|wc -l
551
(well, one is actually the grep process, so "only" 550 glfsheal
processes.
I'll take the last 5:
root 3266352 0.5 0.0 600292 93044 ? Sl 06:55 0:07
/usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
root 3267220 0.7 0.0 600292 91964 ? Sl 07:00 0:07
/usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
root 3268076 1.0 0.0 600160 88216 ? Sl 07:05 0:08
/usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
root 3269492 1.6 0.0 600292 91248 ? Sl 07:10 0:07
/usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
root 3270354 4.4 0.0 600292 93260 ? Sl 07:15 0:07
/usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
-8<--
root at str957-clustor00:~# ps -o ppid= 3266352
3266345
root at str957-clustor00:~# ps -o ppid= 3267220
3267213
root at str957-clustor00:~# ps -o ppid= 3268076
3268069
root at str957-clustor00:~# ps -o ppid= 3269492
3269485
root at str957-clustor00:~# ps -o ppid= 3270354
3270347
root at str957-clustor00:~# ps aux|grep 3266345
root 3266345 0.0 0.0 430536 10764 ? Sl 06:55 0:00
gluster volume heal cluster_data info summary --xml
root 3271532 0.0 0.0 6260 2500 pts/1 S+ 07:21 0:00 grep
3266345
root at str957-clustor00:~# ps aux|grep 3267213
root 3267213 0.0 0.0 430536 10644 ? Sl 07:00 0:00
gluster volume heal cluster_data info summary --xml
root 3271599 0.0 0.0 6260 2480 pts/1 S+ 07:22 0:00 grep
3267213
root at str957-clustor00:~# ps aux|grep 3268069
root 3268069 0.0 0.0 430536 10704 ? Sl 07:05 0:00
gluster volume heal cluster_data info summary --xml
root 3271626 0.0 0.0 6260 2516 pts/1 S+ 07:22 0:00 grep
3268069
root at str957-clustor00:~# ps aux|grep 3269485
root 3269485 0.0 0.0 430536 10756 ? Sl 07:10 0:00
gluster volume heal cluster_data info summary --xml
root 3271647 0.0 0.0 6260 2480 pts/1 S+ 07:22 0:00 grep
3269485
root at str957-clustor00:~# ps aux|grep 3270347
root 3270347 0.0 0.0 430536 10672 ? Sl 07:15 0:00
gluster volume heal cluster_data info summary --xml
root 3271666 0.0 0.0 6260 2568 pts/1 S+ 07:22 0:00 grep
3270347
-8<--
Seems glfsheal is spawning more processes.
I can't rule out a metadata corruption (or at least a desync), but it
shouldn't happen...
Diego
Il 15/03/2023 20:11, Strahil Nikolov ha scritto:> If you don't experience any OOM , you can focus on the heals.
>
> 284 processes of glfsheal seems odd.
>
> Can you check the ppid for 2-3 randomly picked ?
> ps -o ppid= <pid>
>
> Best Regards,
> Strahil Nikolov
>
> On Wed, Mar 15, 2023 at 9:54, Diego Zuccato
> <diego.zuccato at unibo.it> wrote:
> I enabled it yesterday and that greatly reduced memory pressure.
> Current volume info:
> -8<--
> Volume Name: cluster_data
> Type: Distributed-Replicate
> Volume ID: a8caaa90-d161-45bb-a68c-278263a8531a
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 45 x (2 + 1) = 135
> Transport-type: tcp
> Bricks:
> Brick1: clustor00:/srv/bricks/00/d
> Brick2: clustor01:/srv/bricks/00/d
> Brick3: clustor02:/srv/bricks/00/q (arbiter)
> [...]
> Brick133: clustor01:/srv/bricks/29/d
> Brick134: clustor02:/srv/bricks/29/d
> Brick135: clustor00:/srv/bricks/14/q (arbiter)
> Options Reconfigured:
> performance.quick-read: off
> cluster.entry-self-heal: on
> cluster.data-self-heal-algorithm: full
> cluster.metadata-self-heal: on
> cluster.shd-max-threads: 2
> network.inode-lru-limit: 500000
> performance.md-cache-timeout: 600
> performance.cache-invalidation: on
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: on
> features.quota-deem-statfs: on
> performance.readdir-ahead: on
> cluster.granular-entry-heal: enable
> features.scrub: Active
> features.bitrot: on
> cluster.lookup-optimize: on
> performance.stat-prefetch: on
> performance.cache-refresh-timeout: 60
> performance.parallel-readdir: on
> performance.write-behind-window-size: 128MB
> cluster.self-heal-daemon: enable
> features.inode-quota: on
> features.quota: on
> transport.address-family: inet
> nfs.disable: on
> performance.client-io-threads: off
> client.event-threads: 1
> features.scrub-throttle: normal
> diagnostics.brick-log-level: ERROR
> diagnostics.client-log-level: ERROR
> config.brick-threads: 0
> cluster.lookup-unhashed: on
> config.client-threads: 1
> cluster.use-anonymous-inode: off
> diagnostics.brick-sys-log-level: CRITICAL
> features.scrub-freq: monthly
> cluster.data-self-heal: on
> cluster.brick-multiplex: on
> cluster.daemon-log-level: ERROR
> -8<--
>
> htop reports that memory usage is up to 143G, there are 602 tasks and
> 5232 threads (~20 running) on clustor00, 117G/49 tasks/1565 threads on
> clustor01 and 126G/45 tasks/1574 threads on clustor02.
> I see quite a lot (284!) of glfsheal processes running on clustor00 (a
> "gluster v heal cluster_data info summary" is running on
clustor02
> since
> yesterday, still no output). Shouldn't be just one per brick?
>
> Diego
>
> Il 15/03/2023 08:30, Strahil Nikolov ha scritto:
> > Do you use brick multiplexing ?
> >
> > Best Regards,
> > Strahil Nikolov
> >
> >? ? On Tue, Mar 14, 2023 at 16:44, Diego Zuccato
> >? ? <diego.zuccato at unibo.it <mailto:diego.zuccato at
unibo.it>> wrote:
> >? ? Hello all.
> >
> >? ? Our Gluster 9.6 cluster is showing increasing problems.
> >? ? Currently it's composed of 3 servers (2x Intel Xeon 4210
[20
> cores dual
> >? ? thread, total 40 threads], 192GB RAM, 30x HGST HUH721212AL5200
> [12TB]),
> >? ? configured in replica 3 arbiter 1. Using Debian packages from
> Gluster
> >? ? 9.x latest repository.
> >
> >? ? Seems 192G RAM are not enough to handle 30 data bricks + 15
> arbiters
> >? ? and
> >? ? I often had to reload glusterfsd because glusterfs processed
> got killed
> >? ? for OOM.
> >? ? On top of that, performance have been quite bad, especially
> when we
> >? ? reached about 20M files. On top of that, one of the servers
> have had
> >? ? mobo issues that resulted in memory errors that corrupted some
> >? ? bricks fs
> >? ? (XFS, it required "xfs_reparir -L" to fix).
> >? ? Now I'm getting lots of "stale file handle"
errors and other
> errors
> >? ? (like directories that seem empty from the client but still
> containing
> >? ? files in some bricks) and auto healing seems unable to
complete.
> >
> >? ? Since I can't keep up continuing to manually fix all the
> issues, I'm
> >? ? thinking about backup+destroy+recreate strategy.
> >
> >? ? I think that if I reduce the number of bricks per server to
just 5
> >? ? (RAID1 of 6x12TB disks) I might resolve RAM issues - at the
> cost of
> >? ? longer heal times in case a disk fails. Am I right or it's
> useless?
> >? ? Other recommendations?
> >? ? Servers have space for another 6 disks. Maybe those could be
> used for
> >? ? some SSDs to speed up access?
> >
> >? ? TIA.
> >
> >? ? --
> >? ? Diego Zuccato
> >? ? DIFA - Dip. di Fisica e Astronomia
> >? ? Servizi Informatici
> >? ? Alma Mater Studiorum - Universit? di Bologna
> >? ? V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> >? ? tel.: +39 051 20 95786
> >? ? ________
> >
> >
> >
> >? ? Community Meeting Calendar:
> >
> >? ? Schedule -
> >? ? Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> >? ? Bridge: https://meet.google.com/cpu-eiue-hvk
> <https://meet.google.com/cpu-eiue-hvk>
> >? ? <https://meet.google.com/cpu-eiue-hvk
> <https://meet.google.com/cpu-eiue-hvk>>
> >? ? Gluster-users mailing list
> > Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
> <mailto:Gluster-users at gluster.org>
> > https://lists.gluster.org/mailman/listinfo/gluster-users
> <https://lists.gluster.org/mailman/listinfo/gluster-users>
> >? ? <https://lists.gluster.org/mailman/listinfo/gluster-users
> <https://lists.gluster.org/mailman/listinfo/gluster-users>>
>
> >
>
> --
> Diego Zuccato
> DIFA - Dip. di Fisica e Astronomia
> Servizi Informatici
> Alma Mater Studiorum - Universit? di Bologna
> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> tel.: +39 051 20 95786
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> <https://meet.google.com/cpu-eiue-hvk>
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-users
> <https://lists.gluster.org/mailman/listinfo/gluster-users>
>
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Universit? di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786