In Debian stopping glusterd does not stop brick processes: to stop
everything (and free the memory) I have to
systemctl stop glusterd
killall glusterfs{,d}
killall glfsheal
systemctl start glusterd
[this behaviour hangs a simple reboot of a machine running glusterd...
not nice]
For now I just restarted glusterd w/o killing the bricks:
root at str957-clustor00:~# ps aux|grep glfsheal|wc -l ; systemctl restart
glusterd ; ps aux|grep glfsheal|wc -l
618
618
No change neither in glfsheal processes nor in free memory :(
Should I "killall glfsheal" before OOK kicks in?
Diego
Il 16/03/2023 12:37, Strahil Nikolov ha scritto:> Can you restart glusterd service (first check that it was not modified
> to kill the bricks)?
>
> Best Regards,
> Strahil Nikolov
>
> On Thu, Mar 16, 2023 at 8:26, Diego Zuccato
> <diego.zuccato at unibo.it> wrote:
> OOM is just just a matter of time.
>
> Today mem use is up to 177G/187 and:
> # ps aux|grep glfsheal|wc -l
> 551
>
> (well, one is actually the grep process, so "only" 550
glfsheal
> processes.
>
> I'll take the last 5:
> root? ? 3266352? 0.5? 0.0 600292 93044 ?? ? ? ? Sl? 06:55? 0:07
> /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
> root? ? 3267220? 0.7? 0.0 600292 91964 ?? ? ? ? Sl? 07:00? 0:07
> /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
> root? ? 3268076? 1.0? 0.0 600160 88216 ?? ? ? ? Sl? 07:05? 0:08
> /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
> root? ? 3269492? 1.6? 0.0 600292 91248 ?? ? ? ? Sl? 07:10? 0:07
> /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
> root? ? 3270354? 4.4? 0.0 600292 93260 ?? ? ? ? Sl? 07:15? 0:07
> /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
>
> -8<--
> root at str957-clustor00:~# ps -o ppid= 3266352
> 3266345
> root at str957-clustor00:~# ps -o ppid= 3267220
> 3267213
> root at str957-clustor00:~# ps -o ppid= 3268076
> 3268069
> root at str957-clustor00:~# ps -o ppid= 3269492
> 3269485
> root at str957-clustor00:~# ps -o ppid= 3270354
> 3270347
> root at str957-clustor00:~# ps aux|grep 3266345
> root? ? 3266345? 0.0? 0.0 430536 10764 ?? ? ? ? Sl? 06:55? 0:00
> gluster volume heal cluster_data info summary --xml
> root? ? 3271532? 0.0? 0.0? 6260? 2500 pts/1? ? S+? 07:21? 0:00 grep
> 3266345
> root at str957-clustor00:~# ps aux|grep 3267213
> root? ? 3267213? 0.0? 0.0 430536 10644 ?? ? ? ? Sl? 07:00? 0:00
> gluster volume heal cluster_data info summary --xml
> root? ? 3271599? 0.0? 0.0? 6260? 2480 pts/1? ? S+? 07:22? 0:00 grep
> 3267213
> root at str957-clustor00:~# ps aux|grep 3268069
> root? ? 3268069? 0.0? 0.0 430536 10704 ?? ? ? ? Sl? 07:05? 0:00
> gluster volume heal cluster_data info summary --xml
> root? ? 3271626? 0.0? 0.0? 6260? 2516 pts/1? ? S+? 07:22? 0:00 grep
> 3268069
> root at str957-clustor00:~# ps aux|grep 3269485
> root? ? 3269485? 0.0? 0.0 430536 10756 ?? ? ? ? Sl? 07:10? 0:00
> gluster volume heal cluster_data info summary --xml
> root? ? 3271647? 0.0? 0.0? 6260? 2480 pts/1? ? S+? 07:22? 0:00 grep
> 3269485
> root at str957-clustor00:~# ps aux|grep 3270347
> root? ? 3270347? 0.0? 0.0 430536 10672 ?? ? ? ? Sl? 07:15? 0:00
> gluster volume heal cluster_data info summary --xml
> root? ? 3271666? 0.0? 0.0? 6260? 2568 pts/1? ? S+? 07:22? 0:00 grep
> 3270347
> -8<--
>
> Seems glfsheal is spawning more processes.
> I can't rule out a metadata corruption (or at least a desync), but
it
> shouldn't happen...
>
> Diego
>
> Il 15/03/2023 20:11, Strahil Nikolov ha scritto:
> > If you don't experience any OOM , you can focus on the heals.
> >
> > 284 processes of glfsheal seems odd.
> >
> > Can you check the ppid for 2-3 randomly picked ?
> > ps -o ppid= <pid>
> >
> > Best Regards,
> > Strahil Nikolov
> >
> >? ? On Wed, Mar 15, 2023 at 9:54, Diego Zuccato
> >? ? <diego.zuccato at unibo.it <mailto:diego.zuccato at
unibo.it>> wrote:
> >? ? I enabled it yesterday and that greatly reduced memory
pressure.
> >? ? Current volume info:
> >? ? -8<--
> >? ? Volume Name: cluster_data
> >? ? Type: Distributed-Replicate
> >? ? Volume ID: a8caaa90-d161-45bb-a68c-278263a8531a
> >? ? Status: Started
> >? ? Snapshot Count: 0
> >? ? Number of Bricks: 45 x (2 + 1) = 135
> >? ? Transport-type: tcp
> >? ? Bricks:
> >? ? Brick1: clustor00:/srv/bricks/00/d
> >? ? Brick2: clustor01:/srv/bricks/00/d
> >? ? Brick3: clustor02:/srv/bricks/00/q (arbiter)
> >? ? [...]
> >? ? Brick133: clustor01:/srv/bricks/29/d
> >? ? Brick134: clustor02:/srv/bricks/29/d
> >? ? Brick135: clustor00:/srv/bricks/14/q (arbiter)
> >? ? Options Reconfigured:
> >? ? performance.quick-read: off
> >? ? cluster.entry-self-heal: on
> >? ? cluster.data-self-heal-algorithm: full
> >? ? cluster.metadata-self-heal: on
> >? ? cluster.shd-max-threads: 2
> >? ? network.inode-lru-limit: 500000
> >? ? performance.md-cache-timeout: 600
> >? ? performance.cache-invalidation: on
> >? ? features.cache-invalidation-timeout: 600
> >? ? features.cache-invalidation: on
> >? ? features.quota-deem-statfs: on
> >? ? performance.readdir-ahead: on
> >? ? cluster.granular-entry-heal: enable
> >? ? features.scrub: Active
> >? ? features.bitrot: on
> >? ? cluster.lookup-optimize: on
> >? ? performance.stat-prefetch: on
> >? ? performance.cache-refresh-timeout: 60
> >? ? performance.parallel-readdir: on
> >? ? performance.write-behind-window-size: 128MB
> >? ? cluster.self-heal-daemon: enable
> >? ? features.inode-quota: on
> >? ? features.quota: on
> >? ? transport.address-family: inet
> >? ? nfs.disable: on
> >? ? performance.client-io-threads: off
> >? ? client.event-threads: 1
> >? ? features.scrub-throttle: normal
> >? ? diagnostics.brick-log-level: ERROR
> >? ? diagnostics.client-log-level: ERROR
> >? ? config.brick-threads: 0
> >? ? cluster.lookup-unhashed: on
> >? ? config.client-threads: 1
> >? ? cluster.use-anonymous-inode: off
> >? ? diagnostics.brick-sys-log-level: CRITICAL
> >? ? features.scrub-freq: monthly
> >? ? cluster.data-self-heal: on
> >? ? cluster.brick-multiplex: on
> >? ? cluster.daemon-log-level: ERROR
> >? ? -8<--
> >
> >? ? htop reports that memory usage is up to 143G, there are 602
> tasks and
> >? ? 5232 threads (~20 running) on clustor00, 117G/49 tasks/1565
> threads on
> >? ? clustor01 and 126G/45 tasks/1574 threads on clustor02.
> >? ? I see quite a lot (284!) of glfsheal processes running on
> clustor00 (a
> >? ? "gluster v heal cluster_data info summary" is
running on clustor02
> >? ? since
> >? ? yesterday, still no output). Shouldn't be just one per
brick?
> >
> >? ? Diego
> >
> >? ? Il 15/03/2023 08:30, Strahil Nikolov ha scritto:
> >? ? ? > Do you use brick multiplexing ?
> >? ? ? >
> >? ? ? > Best Regards,
> >? ? ? > Strahil Nikolov
> >? ? ? >
> >? ? ? >? ? On Tue, Mar 14, 2023 at 16:44, Diego Zuccato
> >? ? ? >? ? <diego.zuccato at unibo.it
<mailto:diego.zuccato at unibo.it>
> <mailto:diego.zuccato at unibo.it>> wrote:
> >? ? ? >? ? Hello all.
> >? ? ? >
> >? ? ? >? ? Our Gluster 9.6 cluster is showing increasing
problems.
> >? ? ? >? ? Currently it's composed of 3 servers (2x Intel
Xeon
> 4210 [20
> >? ? cores dual
> >? ? ? >? ? thread, total 40 threads], 192GB RAM, 30x HGST
> HUH721212AL5200
> >? ? [12TB]),
> >? ? ? >? ? configured in replica 3 arbiter 1. Using Debian
> packages from
> >? ? Gluster
> >? ? ? >? ? 9.x latest repository.
> >? ? ? >
> >? ? ? >? ? Seems 192G RAM are not enough to handle 30 data
bricks + 15
> >? ? arbiters
> >? ? ? >? ? and
> >? ? ? >? ? I often had to reload glusterfsd because glusterfs
> processed
> >? ? got killed
> >? ? ? >? ? for OOM.
> >? ? ? >? ? On top of that, performance have been quite bad,
especially
> >? ? when we
> >? ? ? >? ? reached about 20M files. On top of that, one of the
servers
> >? ? have had
> >? ? ? >? ? mobo issues that resulted in memory errors that
> corrupted some
> >? ? ? >? ? bricks fs
> >? ? ? >? ? (XFS, it required "xfs_reparir -L" to
fix).
> >? ? ? >? ? Now I'm getting lots of "stale file
handle" errors and
> other
> >? ? errors
> >? ? ? >? ? (like directories that seem empty from the client
but still
> >? ? containing
> >? ? ? >? ? files in some bricks) and auto healing seems unable
to
> complete.
> >? ? ? >
> >? ? ? >? ? Since I can't keep up continuing to manually fix
all the
> >? ? issues, I'm
> >? ? ? >? ? thinking about backup+destroy+recreate strategy.
> >? ? ? >
> >? ? ? >? ? I think that if I reduce the number of bricks per
> server to just 5
> >? ? ? >? ? (RAID1 of 6x12TB disks) I might resolve RAM issues -
at the
> >? ? cost of
> >? ? ? >? ? longer heal times in case a disk fails. Am I right
or it's
> >? ? useless?
> >? ? ? >? ? Other recommendations?
> >? ? ? >? ? Servers have space for another 6 disks. Maybe those
> could be
> >? ? used for
> >? ? ? >? ? some SSDs to speed up access?
> >? ? ? >
> >? ? ? >? ? TIA.
> >? ? ? >
> >? ? ? >? ? --
> >? ? ? >? ? Diego Zuccato
> >? ? ? >? ? DIFA - Dip. di Fisica e Astronomia
> >? ? ? >? ? Servizi Informatici
> >? ? ? >? ? Alma Mater Studiorum - Universit? di Bologna
> >? ? ? >? ? V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> >? ? ? >? ? tel.: +39 051 20 95786
> >? ? ? >? ? ________
> >? ? ? >
> >? ? ? >
> >? ? ? >
> >? ? ? >? ? Community Meeting Calendar:
> >? ? ? >
> >? ? ? >? ? Schedule -
> >? ? ? >? ? Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> >? ? ? >? ? Bridge: https://meet.google.com/cpu-eiue-hvk
> <https://meet.google.com/cpu-eiue-hvk>
> >? ? <https://meet.google.com/cpu-eiue-hvk
> <https://meet.google.com/cpu-eiue-hvk>>
> >? ? ? >? ? <https://meet.google.com/cpu-eiue-hvk
> <https://meet.google.com/cpu-eiue-hvk>
> >? ? <https://meet.google.com/cpu-eiue-hvk
> <https://meet.google.com/cpu-eiue-hvk>>>
> >? ? ? >? ? Gluster-users mailing list
> >? ? ? > Gluster-users at gluster.org
> <mailto:Gluster-users at gluster.org> <mailto:Gluster-users at
gluster.org>
> >? ? <mailto:Gluster-users at gluster.org>
> >? ? ? >
https://lists.gluster.org/mailman/listinfo/gluster-users
> <https://lists.gluster.org/mailman/listinfo/gluster-users>
> >? ? <https://lists.gluster.org/mailman/listinfo/gluster-users
> <https://lists.gluster.org/mailman/listinfo/gluster-users>>
> >? ? ? >
> <https://lists.gluster.org/mailman/listinfo/gluster-users
> <https://lists.gluster.org/mailman/listinfo/gluster-users>
> >? ? <https://lists.gluster.org/mailman/listinfo/gluster-users
>
<https://lists.gluster.org/mailman/listinfo/gluster-users>>>
>
> >
> >? ? ? >
> >
> >? ? --
> >? ? Diego Zuccato
> >? ? DIFA - Dip. di Fisica e Astronomia
> >? ? Servizi Informatici
> >? ? Alma Mater Studiorum - Universit? di Bologna
> >? ? V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> >? ? tel.: +39 051 20 95786
> >? ? ________
> >
> >
> >
> >? ? Community Meeting Calendar:
> >
> >? ? Schedule -
> >? ? Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> >? ? Bridge: https://meet.google.com/cpu-eiue-hvk
> <https://meet.google.com/cpu-eiue-hvk>
> >? ? <https://meet.google.com/cpu-eiue-hvk
> <https://meet.google.com/cpu-eiue-hvk>>
> >? ? Gluster-users mailing list
> > Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
> <mailto:Gluster-users at gluster.org>
> > https://lists.gluster.org/mailman/listinfo/gluster-users
> <https://lists.gluster.org/mailman/listinfo/gluster-users>
> >? ? <https://lists.gluster.org/mailman/listinfo/gluster-users
> <https://lists.gluster.org/mailman/listinfo/gluster-users>>
> >
>
> --
> Diego Zuccato
> DIFA - Dip. di Fisica e Astronomia
> Servizi Informatici
> Alma Mater Studiorum - Universit? di Bologna
> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> tel.: +39 051 20 95786
>
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Universit? di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786