Ilias Chasapakis forumZFD
2024-Apr-09 08:05 UTC
[Gluster-users] Glusterfs 10.5-1 healing issues
Dear all, we would like to describe the situation that we have and that does not solve since a long time, that means after many minor and major upgrades of GlusterFS We use a KVM environment for VMs for glusterfs and host servers are updated regularly. Hosts are disomogeneous hardware, but configured with same characteristics. The VMs have been also harmonized to use the virtio drivers where available for devices and resources reserved are the same on each host. Physical switch for hosts has been substituted with a reliable one. Probing peers has been and is quite quick in the heartbeat network and communication between the servers for apparently has no issues on disruptions. And I say apparently because what we have is: - always pending failed heals that used to resolve by a rotated reboot of the gluster vms (replica 3). Restarting only glusterfs related services (daemon, events etc.) has no effect, only reboot brings results - very often failed heals are directories We lately removed a brick that was on a vm on a host that has been entirely substituted. Re-added the brick, sync went on and all data was eventually synced and started with 0 pending failed heals. Now it develops failed heals too like its fellow bricks. Please take into account we healed all the failed entries (manually with various methods) before adding the third brick. After some days of operating, the count of failed heals rises again, not really fast but with new entries for sure (which might solve with rotated reboots, or not). We have gluster clients also on ctdbs that connect to the gluster and mount via glusterfs client. Windows roaming profiles shared via smb become frequently corrupted,(they are composed of a great number small files and are though of big total dimension). Gluster nodes are formatted with xfs. Also what we observer is that mounting with the vfs option in smb on the ctdbs has some kind of delay. This means that you can see the shared folder on for example a Windows client machine on a ctdb, but not on another ctdb in the cluster and then after a while it appears there too. And this frequently st This is an excerpt of entries on our shd logs:> 2024-04-08 10:13:26.213596 +0000] I [MSGID: 108026] > [afr-self-heal-entry.c:1080:afr_selfheal_entry_do] > 0-gv-ho-replicate-0: performing full entry selfheal on > 2c621415-6223-4b66-a4ca-3f6f267a448d > [2024-04-08 10:14:08.135911 +0000] W [MSGID: 114031] > [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-gv-ho-client-5: > remote operation failed. > [{source=<gfid:91d83f0e-1864-4ff3-9174-b7c956e20596>}, > {target=(null)}, {errno=116}, {error=Veraltete Dateizugriffsnummer > (file handle)}] > [2024-04-08 10:15:59.135908 +0000] W [MSGID: 114061] > [client-common.c:2992:client_pre_readdir_v2] 0-gv-ho-client-5: > remote_fd is -1. EBADFD [{gfid=6b5e599e-c836-4ebe-b16a-8224425b88c7}, > {errno=77}, {error=Die Dateizugriffsnummer ist in schlechter Verfassung}] > [2024-04-08 10:30:25.013592 +0000] I [MSGID: 108026] > [afr-self-heal-entry.c:1080:afr_selfheal_entry_do] > 0-gv-ho-replicate-0: performing full entry selfheal on > 24e82e12-5512-4679-9eb3-8bd098367db7 > [2024-04-08 10:33:17.613594 +0000] W [MSGID: 114031] > [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-gv-ho-client-5: > remote operation failed. > [{source=<gfid:ef9068fc-a329-4a21-88d2-265ecd3d208c>}, > {target=(null)}, {errno=116}, {error=Veraltete Dateizugriffsnummer > (file handle)}] > [2024-04-08 10:33:21.201359 +0000] W [MSGID: 114031] > [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-gv-ho-client-5: > remote operation failed. [{sourceHow are he clients mapped to real hosts in order to know on which one?s logs to look at? We would like to go by exclusion to finally eradicate this, possibly in a conservative way (not rebuilding everything) and we are becoming clueless as to where to look at as we also tried various options settings regarding performance etc. Here is the set on our main volume:> cluster.lookup-unhashed????????????????? on (DEFAULT) > cluster.lookup-optimize????????????????? on (DEFAULT) > cluster.min-free-disk??????????????????? 10% (DEFAULT) > cluster.min-free-inodes????????????????? 5% (DEFAULT) > cluster.rebalance-stats????????????????? off (DEFAULT) > cluster.subvols-per-directory??????????? (null) (DEFAULT) > cluster.readdir-optimize???????????????? off (DEFAULT) > cluster.rsync-hash-regex???????????????? (null) (DEFAULT) > cluster.extra-hash-regex???????????????? (null) (DEFAULT) > cluster.dht-xattr-name?????????????????? trusted.glusterfs.dht (DEFAULT) > cluster.randomize-hash-range-by-gfid???? off (DEFAULT) > cluster.rebal-throttle?????????????????? normal (DEFAULT) > cluster.lock-migration off > cluster.force-migration off > cluster.local-volume-name??????????????? (null) (DEFAULT) > cluster.weighted-rebalance?????????????? on (DEFAULT) > cluster.switch-pattern?????????????????? (null) (DEFAULT) > cluster.entry-change-log???????????????? on (DEFAULT) > cluster.read-subvolume?????????????????? (null) (DEFAULT) > cluster.read-subvolume-index???????????? -1 (DEFAULT) > cluster.read-hash-mode?????????????????? 1 (DEFAULT) > cluster.background-self-heal-count?????? 8 (DEFAULT) > cluster.metadata-self-heal on > cluster.data-self-heal on > cluster.entry-self-heal on > cluster.self-heal-daemon enable > cluster.heal-timeout???????????????????? 600 (DEFAULT) > cluster.self-heal-window-size??????????? 8 (DEFAULT) > cluster.data-change-log????????????????? on (DEFAULT) > cluster.metadata-change-log????????????? on (DEFAULT) > cluster.data-self-heal-algorithm???????? (null) (DEFAULT) > cluster.eager-lock?????????????????????? on (DEFAULT) > disperse.eager-lock????????????????????? on (DEFAULT) > disperse.other-eager-lock??????????????? on (DEFAULT) > disperse.eager-lock-timeout????????????? 1 (DEFAULT) > disperse.other-eager-lock-timeout??????? 1 (DEFAULT) > cluster.quorum-type auto > cluster.quorum-count 2 > cluster.choose-local???????????????????? true (DEFAULT) > cluster.self-heal-readdir-size?????????? 1KB (DEFAULT) > cluster.post-op-delay-secs?????????????? 1 (DEFAULT) > cluster.ensure-durability??????????????? on (DEFAULT) > cluster.consistent-metadata????????????? no (DEFAULT) > cluster.heal-wait-queue-length?????????? 128 (DEFAULT) > cluster.favorite-child-policy none > cluster.full-lock??????????????????????? yes (DEFAULT) > cluster.optimistic-change-log??????????? on (DEFAULT) > diagnostics.latency-measurement off > diagnostics.dump-fd-stats??????????????? off (DEFAULT) > diagnostics.count-fop-hits off > diagnostics.brick-log-level INFO > diagnostics.client-log-level INFO > diagnostics.brick-sys-log-level????????? CRITICAL (DEFAULT) > diagnostics.client-sys-log-level???????? CRITICAL (DEFAULT) > diagnostics.brick-logger???????????????? (null) (DEFAULT) > diagnostics.client-logger??????????????? (null) (DEFAULT) > diagnostics.brick-log-format???????????? (null) (DEFAULT) > diagnostics.client-log-format??????????? (null) (DEFAULT) > diagnostics.brick-log-buf-size?????????? 5 (DEFAULT) > diagnostics.client-log-buf-size????????? 5 (DEFAULT) > diagnostics.brick-log-flush-timeout????? 120 (DEFAULT) > diagnostics.client-log-flush-timeout???? 120 (DEFAULT) > diagnostics.stats-dump-interval????????? 0 (DEFAULT) > diagnostics.fop-sample-interval????????? 0 (DEFAULT) > diagnostics.stats-dump-format??????????? json (DEFAULT) > diagnostics.fop-sample-buf-size????????? 65535 (DEFAULT) > diagnostics.stats-dnscache-ttl-sec?????? 86400 (DEFAULT) > performance.cache-max-file-size 10 > performance.cache-min-file-size????????? 0 (DEFAULT) > performance.cache-refresh-timeout??????? 1 (DEFAULT) > performance.cache-priority (DEFAULT) > performance.io-cache-size??????????????? 32MB (DEFAULT) > performance.cache-size?????????????????? 32MB (DEFAULT) > performance.io-thread-count????????????? 16 (DEFAULT) > performance.high-prio-threads??????????? 16 (DEFAULT) > performance.normal-prio-threads????????? 16 (DEFAULT) > performance.low-prio-threads???????????? 16 (DEFAULT) > performance.least-prio-threads?????????? 1 (DEFAULT) > performance.enable-least-priority??????? on (DEFAULT) > performance.iot-watchdog-secs??????????? (null) (DEFAULT) > performance.iot-cleanup-disconnected-reqs off (DEFAULT) > performance.iot-pass-through???????????? false (DEFAULT) > performance.io-cache-pass-through??????? false (DEFAULT) > performance.quick-read-cache-size??????? 128MB (DEFAULT) > performance.cache-size?????????????????? 128MB (DEFAULT) > performance.quick-read-cache-timeout???? 1 (DEFAULT) > performance.qr-cache-timeout 600 > performance.quick-read-cache-invalidation false (DEFAULT) > performance.ctime-invalidation?????????? false (DEFAULT) > performance.flush-behind???????????????? on (DEFAULT) > performance.nfs.flush-behind???????????? on (DEFAULT) > performance.write-behind-window-size 4MB > performance.resync-failed-syncs-after-fsync off (DEFAULT) > performance.nfs.write-behind-window-size 1MB (DEFAULT) > performance.strict-o-direct????????????? off (DEFAULT) > performance.nfs.strict-o-direct????????? off (DEFAULT) > performance.strict-write-ordering??????? off (DEFAULT) > performance.nfs.strict-write-ordering??? off (DEFAULT) > performance.write-behind-trickling-writes on (DEFAULT) > performance.aggregate-size?????????????? 128KB (DEFAULT) > performance.nfs.write-behind-trickling-writes on (DEFAULT) > performance.lazy-open??????????????????? yes (DEFAULT) > performance.read-after-open????????????? yes (DEFAULT) > performance.open-behind-pass-through???? false (DEFAULT) > performance.read-ahead-page-count??????? 4 (DEFAULT) > performance.read-ahead-pass-through????? false (DEFAULT) > performance.readdir-ahead-pass-through?? false (DEFAULT) > performance.md-cache-pass-through??????? false (DEFAULT) > performance.write-behind-pass-through??? false (DEFAULT) > performance.md-cache-timeout 600 > performance.cache-swift-metadata???????? false (DEFAULT) > performance.cache-samba-metadata on > performance.cache-capability-xattrs????? true (DEFAULT) > performance.cache-ima-xattrs???????????? true (DEFAULT) > performance.md-cache-statfs????????????? off (DEFAULT) > performance.xattr-cache-list (DEFAULT) > performance.nl-cache-pass-through??????? false (DEFAULT) > network.frame-timeout??????????????????? 1800 (DEFAULT) > network.ping-timeout 20 > network.tcp-window-size????????????????? (null) (DEFAULT) > client.ssl off > network.remote-dio?????????????????????? disable (DEFAULT) > client.event-threads 4 > client.tcp-user-timeout 0 > client.keepalive-time 20 > client.keepalive-interval 2 > client.keepalive-count 9 > client.strict-locks off > network.tcp-window-size????????????????? (null) (DEFAULT) > network.inode-lru-limit 200000 > auth.allow * > auth.reject????????????????????????????? (null) (DEFAULT) > transport.keepalive 1 > server.allow-insecure??????????????????? on (DEFAULT) > server.root-squash?????????????????????? off (DEFAULT) > server.all-squash??????????????????????? off (DEFAULT) > server.anonuid?????????????????????????? 65534 (DEFAULT) > server.anongid?????????????????????????? 65534 (DEFAULT) > server.statedump-path??????????????????? /var/run/gluster (DEFAULT) > server.outstanding-rpc-limit???????????? 64 (DEFAULT) > server.ssl off > auth.ssl-allow * > server.manage-gids?????????????????????? off (DEFAULT) > server.dynamic-auth????????????????????? on (DEFAULT) > client.send-gids???????????????????????? on (DEFAULT) > server.gid-timeout?????????????????????? 300 (DEFAULT) > server.own-thread??????????????????????? (null) (DEFAULT) > server.event-threads 4 > server.tcp-user-timeout????????????????? 42 (DEFAULT) > server.keepalive-time 20 > server.keepalive-interval 2 > server.keepalive-count 9 > transport.listen-backlog 1024 > ssl.own-cert???????????????????????????? (null) (DEFAULT) > ssl.private-key????????????????????????? (null) (DEFAULT) > ssl.ca-list????????????????????????????? (null) (DEFAULT) > ssl.crl-path???????????????????????????? (null) (DEFAULT) > ssl.certificate-depth??????????????????? (null) (DEFAULT) > ssl.cipher-list????????????????????????? (null) (DEFAULT) > ssl.dh-param???????????????????????????? (null) (DEFAULT) > ssl.ec-curve???????????????????????????? (null) (DEFAULT) > transport.address-family inet > performance.write-behind off > performance.read-ahead on > performance.readdir-ahead on > performance.io-cache off > performance.open-behind on > performance.quick-read on > performance.nl-cache on > performance.stat-prefetch on > performance.client-io-threads off > performance.nfs.write-behind on > performance.nfs.read-ahead off > performance.nfs.io-cache off > performance.nfs.quick-read off > performance.nfs.stat-prefetch off > performance.nfs.io-threads off > performance.force-readdirp?????????????? true (DEFAULT) > performance.cache-invalidation on > performance.global-cache-invalidation??? true (DEFAULT) > features.uss off > features.snapshot-directory .snaps > features.show-snapshot-directory off > features.tag-namespaces off > network.compression off > network.compression.window-size????????? -15 (DEFAULT) > network.compression.mem-level??????????? 8 (DEFAULT) > network.compression.min-size???????????? 0 (DEFAULT) > network.compression.compression-level??? -1 (DEFAULT) > network.compression.debug??????????????? false (DEFAULT) > features.default-soft-limit????????????? 80% (DEFAULT) > features.soft-timeout??????????????????? 60 (DEFAULT) > features.hard-timeout??????????????????? 5 (DEFAULT) > features.alert-time????????????????????? 86400 (DEFAULT) > features.quota-deem-statfs off > geo-replication.indexing off > geo-replication.indexing off > geo-replication.ignore-pid-check off > geo-replication.ignore-pid-check off > features.quota off > features.inode-quota off > features.bitrot disable > debug.trace off > debug.log-history??????????????????????? no (DEFAULT) > debug.log-file?????????????????????????? no (DEFAULT) > debug.exclude-ops??????????????????????? (null) (DEFAULT) > debug.include-ops??????????????????????? (null) (DEFAULT) > debug.error-gen off > debug.error-failure????????????????????? (null) (DEFAULT) > debug.error-number?????????????????????? (null) (DEFAULT) > debug.random-failure???????????????????? off (DEFAULT) > debug.error-fops???????????????????????? (null) (DEFAULT) > nfs.disable on > features.read-only?????????????????????? off (DEFAULT) > features.worm off > features.worm-file-level off > features.worm-files-deletable on > features.default-retention-period??????? 120 (DEFAULT) > features.retention-mode????????????????? relax (DEFAULT) > features.auto-commit-period????????????? 180 (DEFAULT) > storage.linux-aio??????????????????????? off (DEFAULT) > storage.linux-io_uring?????????????????? off (DEFAULT) > storage.batch-fsync-mode???????????????? reverse-fsync (DEFAULT) > storage.batch-fsync-delay-usec?????????? 0 (DEFAULT) > storage.owner-uid??????????????????????? -1 (DEFAULT) > storage.owner-gid??????????????????????? -1 (DEFAULT) > storage.node-uuid-pathinfo?????????????? off (DEFAULT) > storage.health-check-interval??????????? 30 (DEFAULT) > storage.build-pgfid????????????????????? off (DEFAULT) > storage.gfid2path??????????????????????? on (DEFAULT) > storage.gfid2path-separator????????????? : (DEFAULT) > storage.reserve????????????????????????? 1 (DEFAULT) > storage.health-check-timeout???????????? 20 (DEFAULT) > storage.fips-mode-rchecksum on > storage.force-create-mode??????????????? 0000 (DEFAULT) > storage.force-directory-mode???????????? 0000 (DEFAULT) > storage.create-mask????????????????????? 0777 (DEFAULT) > storage.create-directory-mask??????????? 0777 (DEFAULT) > storage.max-hardlinks??????????????????? 100 (DEFAULT) > features.ctime?????????????????????????? on (DEFAULT) > config.gfproxyd off > cluster.server-quorum-type server > cluster.server-quorum-ratio 51 > changelog.changelog????????????????????? off (DEFAULT) > changelog.changelog-dir????????????????? {{ brick.path > }}/.glusterfs/changelogs (DEFAULT) > changelog.encoding?????????????????????? ascii (DEFAULT) > changelog.rollover-time????????????????? 15 (DEFAULT) > changelog.fsync-interval???????????????? 5 (DEFAULT) > changelog.changelog-barrier-timeout 120 > changelog.capture-del-path?????????????? off (DEFAULT) > features.barrier disable > features.barrier-timeout 120 > features.trash?????????????????????????? off (DEFAULT) > features.trash-dir?????????????????????? .trashcan (DEFAULT) > features.trash-eliminate-path??????????? (null) (DEFAULT) > features.trash-max-filesize????????????? 5MB (DEFAULT) > features.trash-internal-op?????????????? off (DEFAULT) > cluster.enable-shared-storage disable > locks.trace????????????????????????????? off (DEFAULT) > locks.mandatory-locking????????????????? off (DEFAULT) > cluster.disperse-self-heal-daemon??????? enable (DEFAULT) > cluster.quorum-reads???????????????????? no (DEFAULT) > client.bind-insecure???????????????????? (null) (DEFAULT) > features.timeout???????????????????????? 45 (DEFAULT) > features.failover-hosts????????????????? (null) (DEFAULT) > features.shard off > features.shard-block-size??????????????? 64MB (DEFAULT) > features.shard-lru-limit???????????????? 16384 (DEFAULT) > features.shard-deletion-rate???????????? 100 (DEFAULT) > features.scrub-throttle lazy > features.scrub-freq biweekly > features.scrub?????????????????????????? false (DEFAULT) > features.expiry-time 120 > features.signer-threads 4 > features.cache-invalidation on > features.cache-invalidation-timeout 600 > ganesha.enable off > features.leases off > features.lease-lock-recall-timeout?????? 60 (DEFAULT) > disperse.background-heals??????????????? 8 (DEFAULT) > disperse.heal-wait-qlength?????????????? 128 (DEFAULT) > cluster.heal-timeout???????????????????? 600 (DEFAULT) > dht.force-readdirp?????????????????????? on (DEFAULT) > disperse.read-policy???????????????????? gfid-hash (DEFAULT) > cluster.shd-max-threads 4 > cluster.shd-wait-qlength???????????????? 1024 (DEFAULT) > cluster.locking-scheme?????????????????? full (DEFAULT) > cluster.granular-entry-heal????????????? no (DEFAULT) > features.locks-revocation-secs?????????? 0 (DEFAULT) > features.locks-revocation-clear-all????? false (DEFAULT) > features.locks-revocation-max-blocked??? 0 (DEFAULT) > features.locks-monkey-unlocking????????? false (DEFAULT) > features.locks-notify-contention???????? yes (DEFAULT) > features.locks-notify-contention-delay?? 5 (DEFAULT) > disperse.shd-max-threads???????????????? 1 (DEFAULT) > disperse.shd-wait-qlength 4096 > disperse.cpu-extensions????????????????? auto (DEFAULT) > disperse.self-heal-window-size?????????? 32 (DEFAULT) > cluster.use-compound-fops off > performance.parallel-readdir on > performance.rda-request-size 131072 > performance.rda-low-wmark??????????????? 4096 (DEFAULT) > performance.rda-high-wmark?????????????? 128KB (DEFAULT) > performance.rda-cache-limit 10MB > performance.nl-cache-positive-entry????? false (DEFAULT) > performance.nl-cache-limit 10MB > performance.nl-cache-timeout 600 > cluster.brick-multiplex disable > cluster.brick-graceful-cleanup disable > glusterd.vol_count_per_thread 100 > cluster.max-bricks-per-process 250 > disperse.optimistic-change-log?????????? on (DEFAULT) > disperse.stripe-cache??????????????????? 4 (DEFAULT) > cluster.halo-enabled???????????????????? False (DEFAULT) > cluster.halo-shd-max-latency???????????? 99999 (DEFAULT) > cluster.halo-nfsd-max-latency??????????? 5 (DEFAULT) > cluster.halo-max-latency???????????????? 5 (DEFAULT) > cluster.halo-max-replicas??????????????? 99999 (DEFAULT) > cluster.halo-min-replicas??????????????? 2 (DEFAULT) > features.selinux on > cluster.daemon-log-level INFO > debug.delay-gen off > delay-gen.delay-percentage?????????????? 10% (DEFAULT) > delay-gen.delay-duration???????????????? 100000 (DEFAULT) > delay-gen.enable (DEFAULT) > disperse.parallel-writes???????????????? on (DEFAULT) > disperse.quorum-count??????????????????? 0 (DEFAULT) > features.sdfs off > features.cloudsync off > features.ctime on > ctime.noatime on > features.cloudsync-storetype???????????? (null) (DEFAULT) > features.enforce-mandatory-lock off > config.global-threading off > config.client-threads 16 > config.brick-threads 16 > features.cloudsync-remote-read off > features.cloudsync-store-id????????????? (null) (DEFAULT) > features.cloudsync-product-id??????????? (null) (DEFAULT) > features.acl enable > cluster.use-anonymous-inode yes > rebalance.ensure-durability????????????? on (DEFAULT)Again, sorry for the long post. We would be happy to have this solved as we are excited using glusterfs and we would like to go back to having a stable configuration. We always appreciate the spirit of collaboration and reciprocal help on this list. Best Ilias -- ?forumZFD Entschieden f?r Frieden | Committed to Peace Ilias Chasapakis Referent IT | IT Consultant Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service Am K?lner Brett 8 | 50825 K?ln | Germany Tel 0221 91273243 | Fax 0221 91273299 | http://www.forumZFD.de Vorstand nach ? 26 BGB, einzelvertretungsberechtigt|Executive Board: Alexander Mauz, Sonja Wiekenberg-Mlalandle, Jens von Bargen VR 17651 Amtsgericht K?ln Spenden|Donations: IBAN DE90 4306 0967 4103 7264 00 BIC GENODEM1GLS -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature.asc Type: application/pgp-signature Size: 665 bytes Desc: OpenPGP digital signature URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20240409/681609fe/attachment.sig>
The big one I see of you is to investigate and enable sharding. It can improve performance and makes it much easier to heal VM style workloads. Be aware that once you turn it on, you can?t go back easily, and you need to copy the VM disk images around to get them to be sharded before it will show any real effect. A couple other recommendations from my main volume (three dedicated host servers with HDDs and SDD/NVM caching and log volumes on ZFS ). The cluster.shd-* entries are especially recommended. This is on gluster 9.4 at the moment, so some of these won?t map exactly. Volume Name: gv1 Type: Replicate Number of Bricks: 1 x 3 = 3 Transport-type: tcp Options Reconfigured: cluster.read-hash-mode: 3 performance.client-io-threads: on performance.write-behind-window-size: 64MB performance.cache-size: 1G nfs.disable: on performance.readdir-ahead: on performance.quick-read: off performance.read-ahead: on performance.io-cache: off performance.stat-prefetch: on cluster.eager-lock: enable network.remote-dio: enable server.event-threads: 4 client.event-threads: 8 performance.io-thread-count: 64 performance.low-prio-threads: 32 features.shard: on features.shard-block-size: 64MB cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10240 cluster.choose-local: false cluster.granular-entry-heal: enable Otherwise, more details about your servers, CPU, RAM, and Disks would be useful for suggestions, and details of your network as well. And if you haven?t done kernel level tuning on the servers, you should address that as well. These all vary a lot by your work load and hardware setup, so there aren?t many generic recommendations I can give other than to make sure you tuned your tcp stack and enabled the none disk elevator on SSDs or disks used by ZFS. There?s a lot of tuning suggesting in the archives if you go searching as well. -Darrell> On Apr 9, 2024, at 3:05?AM, Ilias Chasapakis forumZFD <chasapakis at forumZFD.de> wrote: > > Dear all, > > we would like to describe the situation that we have and that does not solve since a long time, that means after many minor > and major upgrades of GlusterFS > > We use a KVM environment for VMs for glusterfs and host servers are updated regularly. Hosts are disomogeneous hardware, > but configured with same characteristics. > > The VMs have been also harmonized to use the virtio drivers where available for devices and resources reserved are the same > on each host. > > Physical switch for hosts has been substituted with a reliable one. > > Probing peers has been and is quite quick in the heartbeat network and communication between the servers for apparently has no issues on disruptions. > > And I say apparently because what we have is: > > - always pending failed heals that used to resolve by a rotated reboot of the gluster vms (replica 3). Restarting only > glusterfs related services (daemon, events etc.) has no effect, only reboot brings results > - very often failed heals are directories > > We lately removed a brick that was on a vm on a host that has been entirely substituted. Re-added the brick, sync went on and > all data was eventually synced and started with 0 pending failed heals. Now it develops failed heals too like its fellow > bricks. Please take into account we healed all the failed entries (manually with various methods) before adding the third brick. > > After some days of operating, the count of failed heals rises again, not really fast but with new entries for sure (which might solve > with rotated reboots, or not). > > We have gluster clients also on ctdbs that connect to the gluster and mount via glusterfs client. Windows roaming profiles shared via smb become frequently corrupted,(they are composed of a great number small files and are though of big total dimension). Gluster nodes are formatted with xfs. > > Also what we observer is that mounting with the vfs option in smb on the ctdbs has some kind of delay. This means that you can see the shared folder on for example > a Windows client machine on a ctdb, but not on another ctdb in the cluster and then after a while it appears there too. And this frequently st > > > This is an excerpt of entries on our shd logs: > >> 2024-04-08 10:13:26.213596 +0000] I [MSGID: 108026] [afr-self-heal-entry.c:1080:afr_selfheal_entry_do] 0-gv-ho-replicate-0: performing full entry selfheal on 2c621415-6223-4b66-a4ca-3f6f267a448d >> [2024-04-08 10:14:08.135911 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-gv-ho-client-5: remote operation failed. [{source=<gfid:91d83f0e-1864-4ff3-9174-b7c956e20596>}, {target=(null)}, {errno=116}, {error=Veraltete Dateizugriffsnummer (file handle)}] >> [2024-04-08 10:15:59.135908 +0000] W [MSGID: 114061] [client-common.c:2992:client_pre_readdir_v2] 0-gv-ho-client-5: remote_fd is -1. EBADFD [{gfid=6b5e599e-c836-4ebe-b16a-8224425b88c7}, {errno=77}, {error=Die Dateizugriffsnummer ist in schlechter Verfassung}] >> [2024-04-08 10:30:25.013592 +0000] I [MSGID: 108026] [afr-self-heal-entry.c:1080:afr_selfheal_entry_do] 0-gv-ho-replicate-0: performing full entry selfheal on 24e82e12-5512-4679-9eb3-8bd098367db7 >> [2024-04-08 10:33:17.613594 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-gv-ho-client-5: remote operation failed. [{source=<gfid:ef9068fc-a329-4a21-88d2-265ecd3d208c>}, {target=(null)}, {errno=116}, {error=Veraltete Dateizugriffsnummer (file handle)}] >> [2024-04-08 10:33:21.201359 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-gv-ho-client-5: remote operation failed. [{source> > How are he clients mapped to real hosts in order to know on which one?s logs to look at? > > We would like to go by exclusion to finally eradicate this, possibly in a conservative way (not rebuilding everything) and we > > are becoming clueless as to where to look at as we also tried various options settings regarding performance etc. > > Here is the set on our main volume: > >> cluster.lookup-unhashed on (DEFAULT) >> cluster.lookup-optimize on (DEFAULT) >> cluster.min-free-disk 10% (DEFAULT) >> cluster.min-free-inodes 5% (DEFAULT) >> cluster.rebalance-stats off (DEFAULT) >> cluster.subvols-per-directory (null) (DEFAULT) >> cluster.readdir-optimize off (DEFAULT) >> cluster.rsync-hash-regex (null) (DEFAULT) >> cluster.extra-hash-regex (null) (DEFAULT) >> cluster.dht-xattr-name trusted.glusterfs.dht (DEFAULT) >> cluster.randomize-hash-range-by-gfid off (DEFAULT) >> cluster.rebal-throttle normal (DEFAULT) >> cluster.lock-migration off >> cluster.force-migration off >> cluster.local-volume-name (null) (DEFAULT) >> cluster.weighted-rebalance on (DEFAULT) >> cluster.switch-pattern (null) (DEFAULT) >> cluster.entry-change-log on (DEFAULT) >> cluster.read-subvolume (null) (DEFAULT) >> cluster.read-subvolume-index -1 (DEFAULT) >> cluster.read-hash-mode 1 (DEFAULT) >> cluster.background-self-heal-count 8 (DEFAULT) >> cluster.metadata-self-heal on >> cluster.data-self-heal on >> cluster.entry-self-heal on >> cluster.self-heal-daemon enable >> cluster.heal-timeout 600 (DEFAULT) >> cluster.self-heal-window-size 8 (DEFAULT) >> cluster.data-change-log on (DEFAULT) >> cluster.metadata-change-log on (DEFAULT) >> cluster.data-self-heal-algorithm (null) (DEFAULT) >> cluster.eager-lock on (DEFAULT) >> disperse.eager-lock on (DEFAULT) >> disperse.other-eager-lock on (DEFAULT) >> disperse.eager-lock-timeout 1 (DEFAULT) >> disperse.other-eager-lock-timeout 1 (DEFAULT) >> cluster.quorum-type auto >> cluster.quorum-count 2 >> cluster.choose-local true (DEFAULT) >> cluster.self-heal-readdir-size 1KB (DEFAULT) >> cluster.post-op-delay-secs 1 (DEFAULT) >> cluster.ensure-durability on (DEFAULT) >> cluster.consistent-metadata no (DEFAULT) >> cluster.heal-wait-queue-length 128 (DEFAULT) >> cluster.favorite-child-policy none >> cluster.full-lock yes (DEFAULT) >> cluster.optimistic-change-log on (DEFAULT) >> diagnostics.latency-measurement off >> diagnostics.dump-fd-stats off (DEFAULT) >> diagnostics.count-fop-hits off >> diagnostics.brick-log-level INFO >> diagnostics.client-log-level INFO >> diagnostics.brick-sys-log-level CRITICAL (DEFAULT) >> diagnostics.client-sys-log-level CRITICAL (DEFAULT) >> diagnostics.brick-logger (null) (DEFAULT) >> diagnostics.client-logger (null) (DEFAULT) >> diagnostics.brick-log-format (null) (DEFAULT) >> diagnostics.client-log-format (null) (DEFAULT) >> diagnostics.brick-log-buf-size 5 (DEFAULT) >> diagnostics.client-log-buf-size 5 (DEFAULT) >> diagnostics.brick-log-flush-timeout 120 (DEFAULT) >> diagnostics.client-log-flush-timeout 120 (DEFAULT) >> diagnostics.stats-dump-interval 0 (DEFAULT) >> diagnostics.fop-sample-interval 0 (DEFAULT) >> diagnostics.stats-dump-format json (DEFAULT) >> diagnostics.fop-sample-buf-size 65535 (DEFAULT) >> diagnostics.stats-dnscache-ttl-sec 86400 (DEFAULT) >> performance.cache-max-file-size 10 >> performance.cache-min-file-size 0 (DEFAULT) >> performance.cache-refresh-timeout 1 (DEFAULT) >> performance.cache-priority (DEFAULT) >> performance.io-cache-size 32MB (DEFAULT) >> performance.cache-size 32MB (DEFAULT) >> performance.io-thread-count 16 (DEFAULT) >> performance.high-prio-threads 16 (DEFAULT) >> performance.normal-prio-threads 16 (DEFAULT) >> performance.low-prio-threads 16 (DEFAULT) >> performance.least-prio-threads 1 (DEFAULT) >> performance.enable-least-priority on (DEFAULT) >> performance.iot-watchdog-secs (null) (DEFAULT) >> performance.iot-cleanup-disconnected-reqs off (DEFAULT) >> performance.iot-pass-through false (DEFAULT) >> performance.io-cache-pass-through false (DEFAULT) >> performance.quick-read-cache-size 128MB (DEFAULT) >> performance.cache-size 128MB (DEFAULT) >> performance.quick-read-cache-timeout 1 (DEFAULT) >> performance.qr-cache-timeout 600 >> performance.quick-read-cache-invalidation false (DEFAULT) >> performance.ctime-invalidation false (DEFAULT) >> performance.flush-behind on (DEFAULT) >> performance.nfs.flush-behind on (DEFAULT) >> performance.write-behind-window-size 4MB >> performance.resync-failed-syncs-after-fsync off (DEFAULT) >> performance.nfs.write-behind-window-size 1MB (DEFAULT) >> performance.strict-o-direct off (DEFAULT) >> performance.nfs.strict-o-direct off (DEFAULT) >> performance.strict-write-ordering off (DEFAULT) >> performance.nfs.strict-write-ordering off (DEFAULT) >> performance.write-behind-trickling-writes on (DEFAULT) >> performance.aggregate-size 128KB (DEFAULT) >> performance.nfs.write-behind-trickling-writes on (DEFAULT) >> performance.lazy-open yes (DEFAULT) >> performance.read-after-open yes (DEFAULT) >> performance.open-behind-pass-through false (DEFAULT) >> performance.read-ahead-page-count 4 (DEFAULT) >> performance.read-ahead-pass-through false (DEFAULT) >> performance.readdir-ahead-pass-through false (DEFAULT) >> performance.md-cache-pass-through false (DEFAULT) >> performance.write-behind-pass-through false (DEFAULT) >> performance.md-cache-timeout 600 >> performance.cache-swift-metadata false (DEFAULT) >> performance.cache-samba-metadata on >> performance.cache-capability-xattrs true (DEFAULT) >> performance.cache-ima-xattrs true (DEFAULT) >> performance.md-cache-statfs off (DEFAULT) >> performance.xattr-cache-list (DEFAULT) >> performance.nl-cache-pass-through false (DEFAULT) >> network.frame-timeout 1800 (DEFAULT) >> network.ping-timeout 20 >> network.tcp-window-size (null) (DEFAULT) >> client.ssl off >> network.remote-dio disable (DEFAULT) >> client.event-threads 4 >> client.tcp-user-timeout 0 >> client.keepalive-time 20 >> client.keepalive-interval 2 >> client.keepalive-count 9 >> client.strict-locks off >> network.tcp-window-size (null) (DEFAULT) >> network.inode-lru-limit 200000 >> auth.allow * >> auth.reject (null) (DEFAULT) >> transport.keepalive 1 >> server.allow-insecure on (DEFAULT) >> server.root-squash off (DEFAULT) >> server.all-squash off (DEFAULT) >> server.anonuid 65534 (DEFAULT) >> server.anongid 65534 (DEFAULT) >> server.statedump-path /var/run/gluster (DEFAULT) >> server.outstanding-rpc-limit 64 (DEFAULT) >> server.ssl off >> auth.ssl-allow * >> server.manage-gids off (DEFAULT) >> server.dynamic-auth on (DEFAULT) >> client.send-gids on (DEFAULT) >> server.gid-timeout 300 (DEFAULT) >> server.own-thread (null) (DEFAULT) >> server.event-threads 4 >> server.tcp-user-timeout 42 (DEFAULT) >> server.keepalive-time 20 >> server.keepalive-interval 2 >> server.keepalive-count 9 >> transport.listen-backlog 1024 >> ssl.own-cert (null) (DEFAULT) >> ssl.private-key (null) (DEFAULT) >> ssl.ca-list (null) (DEFAULT) >> ssl.crl-path (null) (DEFAULT) >> ssl.certificate-depth (null) (DEFAULT) >> ssl.cipher-list (null) (DEFAULT) >> ssl.dh-param (null) (DEFAULT) >> ssl.ec-curve (null) (DEFAULT) >> transport.address-family inet >> performance.write-behind off >> performance.read-ahead on >> performance.readdir-ahead on >> performance.io-cache off >> performance.open-behind on >> performance.quick-read on >> performance.nl-cache on >> performance.stat-prefetch on >> performance.client-io-threads off >> performance.nfs.write-behind on >> performance.nfs.read-ahead off >> performance.nfs.io-cache off >> performance.nfs.quick-read off >> performance.nfs.stat-prefetch off >> performance.nfs.io-threads off >> performance.force-readdirp true (DEFAULT) >> performance.cache-invalidation on >> performance.global-cache-invalidation true (DEFAULT) >> features.uss off >> features.snapshot-directory .snaps >> features.show-snapshot-directory off >> features.tag-namespaces off >> network.compression off >> network.compression.window-size -15 (DEFAULT) >> network.compression.mem-level 8 (DEFAULT) >> network.compression.min-size 0 (DEFAULT) >> network.compression.compression-level -1 (DEFAULT) >> network.compression.debug false (DEFAULT) >> features.default-soft-limit 80% (DEFAULT) >> features.soft-timeout 60 (DEFAULT) >> features.hard-timeout 5 (DEFAULT) >> features.alert-time 86400 (DEFAULT) >> features.quota-deem-statfs off >> geo-replication.indexing off >> geo-replication.indexing off >> geo-replication.ignore-pid-check off >> geo-replication.ignore-pid-check off >> features.quota off >> features.inode-quota off >> features.bitrot disable >> debug.trace off >> debug.log-history no (DEFAULT) >> debug.log-file no (DEFAULT) >> debug.exclude-ops (null) (DEFAULT) >> debug.include-ops (null) (DEFAULT) >> debug.error-gen off >> debug.error-failure (null) (DEFAULT) >> debug.error-number (null) (DEFAULT) >> debug.random-failure off (DEFAULT) >> debug.error-fops (null) (DEFAULT) >> nfs.disable on >> features.read-only off (DEFAULT) >> features.worm off >> features.worm-file-level off >> features.worm-files-deletable on >> features.default-retention-period 120 (DEFAULT) >> features.retention-mode relax (DEFAULT) >> features.auto-commit-period 180 (DEFAULT) >> storage.linux-aio off (DEFAULT) >> storage.linux-io_uring off (DEFAULT) >> storage.batch-fsync-mode reverse-fsync (DEFAULT) >> storage.batch-fsync-delay-usec 0 (DEFAULT) >> storage.owner-uid -1 (DEFAULT) >> storage.owner-gid -1 (DEFAULT) >> storage.node-uuid-pathinfo off (DEFAULT) >> storage.health-check-interval 30 (DEFAULT) >> storage.build-pgfid off (DEFAULT) >> storage.gfid2path on (DEFAULT) >> storage.gfid2path-separator : (DEFAULT) >> storage.reserve 1 (DEFAULT) >> storage.health-check-timeout 20 (DEFAULT) >> storage.fips-mode-rchecksum on >> storage.force-create-mode 0000 (DEFAULT) >> storage.force-directory-mode 0000 (DEFAULT) >> storage.create-mask 0777 (DEFAULT) >> storage.create-directory-mask 0777 (DEFAULT) >> storage.max-hardlinks 100 (DEFAULT) >> features.ctime on (DEFAULT) >> config.gfproxyd off >> cluster.server-quorum-type server >> cluster.server-quorum-ratio 51 >> changelog.changelog off (DEFAULT) >> changelog.changelog-dir {{ brick.path }}/.glusterfs/changelogs (DEFAULT) >> changelog.encoding ascii (DEFAULT) >> changelog.rollover-time 15 (DEFAULT) >> changelog.fsync-interval 5 (DEFAULT) >> changelog.changelog-barrier-timeout 120 >> changelog.capture-del-path off (DEFAULT) >> features.barrier disable >> features.barrier-timeout 120 >> features.trash off (DEFAULT) >> features.trash-dir .trashcan (DEFAULT) >> features.trash-eliminate-path (null) (DEFAULT) >> features.trash-max-filesize 5MB (DEFAULT) >> features.trash-internal-op off (DEFAULT) >> cluster.enable-shared-storage disable >> locks.trace off (DEFAULT) >> locks.mandatory-locking off (DEFAULT) >> cluster.disperse-self-heal-daemon enable (DEFAULT) >> cluster.quorum-reads no (DEFAULT) >> client.bind-insecure (null) (DEFAULT) >> features.timeout 45 (DEFAULT) >> features.failover-hosts (null) (DEFAULT) >> features.shard off >> features.shard-block-size 64MB (DEFAULT) >> features.shard-lru-limit 16384 (DEFAULT) >> features.shard-deletion-rate 100 (DEFAULT) >> features.scrub-throttle lazy >> features.scrub-freq biweekly >> features.scrub false (DEFAULT) >> features.expiry-time 120 >> features.signer-threads 4 >> features.cache-invalidation on >> features.cache-invalidation-timeout 600 >> ganesha.enable off >> features.leases off >> features.lease-lock-recall-timeout 60 (DEFAULT) >> disperse.background-heals 8 (DEFAULT) >> disperse.heal-wait-qlength 128 (DEFAULT) >> cluster.heal-timeout 600 (DEFAULT) >> dht.force-readdirp on (DEFAULT) >> disperse.read-policy gfid-hash (DEFAULT) >> cluster.shd-max-threads 4 >> cluster.shd-wait-qlength 1024 (DEFAULT) >> cluster.locking-scheme full (DEFAULT) >> cluster.granular-entry-heal no (DEFAULT) >> features.locks-revocation-secs 0 (DEFAULT) >> features.locks-revocation-clear-all false (DEFAULT) >> features.locks-revocation-max-blocked 0 (DEFAULT) >> features.locks-monkey-unlocking false (DEFAULT) >> features.locks-notify-contention yes (DEFAULT) >> features.locks-notify-contention-delay 5 (DEFAULT) >> disperse.shd-max-threads 1 (DEFAULT) >> disperse.shd-wait-qlength 4096 >> disperse.cpu-extensions auto (DEFAULT) >> disperse.self-heal-window-size 32 (DEFAULT) >> cluster.use-compound-fops off >> performance.parallel-readdir on >> performance.rda-request-size 131072 >> performance.rda-low-wmark 4096 (DEFAULT) >> performance.rda-high-wmark 128KB (DEFAULT) >> performance.rda-cache-limit 10MB >> performance.nl-cache-positive-entry false (DEFAULT) >> performance.nl-cache-limit 10MB >> performance.nl-cache-timeout 600 >> cluster.brick-multiplex disable >> cluster.brick-graceful-cleanup disable >> glusterd.vol_count_per_thread 100 >> cluster.max-bricks-per-process 250 >> disperse.optimistic-change-log on (DEFAULT) >> disperse.stripe-cache 4 (DEFAULT) >> cluster.halo-enabled False (DEFAULT) >> cluster.halo-shd-max-latency 99999 (DEFAULT) >> cluster.halo-nfsd-max-latency 5 (DEFAULT) >> cluster.halo-max-latency 5 (DEFAULT) >> cluster.halo-max-replicas 99999 (DEFAULT) >> cluster.halo-min-replicas 2 (DEFAULT) >> features.selinux on >> cluster.daemon-log-level INFO >> debug.delay-gen off >> delay-gen.delay-percentage 10% (DEFAULT) >> delay-gen.delay-duration 100000 (DEFAULT) >> delay-gen.enable (DEFAULT) >> disperse.parallel-writes on (DEFAULT) >> disperse.quorum-count 0 (DEFAULT) >> features.sdfs off >> features.cloudsync off >> features.ctime on >> ctime.noatime on >> features.cloudsync-storetype (null) (DEFAULT) >> features.enforce-mandatory-lock off >> config.global-threading off >> config.client-threads 16 >> config.brick-threads 16 >> features.cloudsync-remote-read off >> features.cloudsync-store-id (null) (DEFAULT) >> features.cloudsync-product-id (null) (DEFAULT) >> features.acl enable >> cluster.use-anonymous-inode yes >> rebalance.ensure-durability on (DEFAULT) > > Again, sorry for the long post. We would be happy to have this solved as we are excited using glusterfs and we would like to go back to having a stable configuration. > > We always appreciate the spirit of collaboration and reciprocal help on this list. > > Best > Ilias > > -- > ?forumZFD > Entschieden f?r Frieden | Committed to Peace > > Ilias Chasapakis > Referent IT | IT Consultant > > Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service > Am K?lner Brett 8 | 50825 K?ln | Germany > > Tel 0221 91273243 | Fax 0221 91273299 | http://www.forumZFD.de > > Vorstand nach ? 26 BGB, einzelvertretungsberechtigt|Executive Board: > Alexander Mauz, Sonja Wiekenberg-Mlalandle, Jens von Bargen > VR 17651 Amtsgericht K?ln > > Spenden|Donations: IBAN DE90 4306 0967 4103 7264 00 BIC GENODEM1GLS > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20240409/6d49db9c/attachment.html>