Hi, referring to this thread: https://lists.gluster.org/pipermail/gluster-users/2024-January/040465.html especially: https://lists.gluster.org/pipermail/gluster-users/2024-January/040513.html I've updated+rebooted 3 servers (debian bookworm) with gluster 11.1 running. The first 2 servers went fine, gluster volume ok, no heals, so after a couple of minutes i rebooted the 3rd server. And having the same problem again: heals are counting up, no heals happen. gluster volume status+info ok, gluster peer status ok. Full volume status+info: https://pastebin.com/aEEEKn7h Volume Name: sourceimages Type: Replicate Volume ID: d6a559a1-ca4c-48c7-8adf-89048333bb58 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: gluster188:/gluster/md3/sourceimages Brick2: gluster189:/gluster/md3/sourceimages Brick3: gluster190:/gluster/md3/sourceimages Internal IPs: gluster188: 192.168.0.188 gluster189: 192.168.0.189 gluster190: 192.168.0.190 After rebooting the 3rd server (gluster190) the client info looks like this: gluster volume status sourceimages clients Client connections for volume sourceimages ---------------------------------------------- Brick : gluster188:/gluster/md3/sourceimages Clients connected : 17 Hostname BytesRead BytesWritten OpVersion -------- --------- ------------ --------- 192.168.0.188:49151 1047856 988364 110000 192.168.0.189:49149 930792 654096 110000 192.168.0.109:49147 271598 279908 110000 192.168.0.223:49147 126764 130964 110000 192.168.0.222:49146 125848 130144 110000 192.168.0.2:49147 273756 43400387 110000 192.168.0.15:49147 57248531 14327465 110000 192.168.0.126:49147 32282645 671284763 110000 192.168.0.94:49146 125520 128864 110000 192.168.0.66:49146 34086248 666519388 110000 192.168.0.99:49146 3051076 522652843 110000 192.168.0.16:49146 149773024 1049035 110000 192.168.0.110:49146 1574768 566124922 110000 192.168.0.106:49146 152640790 146483580 110000 192.168.0.91:49133 89548971 82709793 110000 192.168.0.190:49149 4132 6540 110000 192.168.0.118:49133 92176 92884 110000 ---------------------------------------------- Brick : gluster189:/gluster/md3/sourceimages Clients connected : 17 Hostname BytesRead BytesWritten OpVersion -------- --------- ------------ --------- 192.168.0.188:49146 935172 658268 110000 192.168.0.189:49151 1039048 977920 110000 192.168.0.126:49146 27106555 231766764 110000 192.168.0.110:49147 1121696 226426262 110000 192.168.0.16:49147 147165735 994015 110000 192.168.0.106:49147 152476618 1091156 110000 192.168.0.94:49147 109612 112688 110000 192.168.0.109:49146 180819 1489715 110000 192.168.0.223:49146 110708 114316 110000 192.168.0.99:49147 2573412 157737429 110000 192.168.0.2:49145 242696 26088710 110000 192.168.0.222:49145 109728 113064 110000 192.168.0.66:49145 27003740 215124678 110000 192.168.0.15:49145 57217513 594699 110000 192.168.0.91:49132 89463431 2714920 110000 192.168.0.190:49148 4132 6540 110000 192.168.0.118:49131 92380 94996 110000 ---------------------------------------------- Brick : gluster190:/gluster/md3/sourceimages Clients connected : 2 Hostname BytesRead BytesWritten OpVersion -------- --------- ------------ --------- 192.168.0.190:49151 21252 27988 110000 192.168.0.118:49132 92176 92884 110000 The bad server (gluster190) has only 2 clients: itself and 192.168.0.118 (was rebooted after gluster190). Well, i remounted the volume on the other clients (without reboot), they appear now - but the most important thing: the other 2 gluster servers are missing. Output shortened, removed the connected clients: gluster volume status sourceimages clients Client connections for volume sourceimages ---------------------------------------------- Brick : gluster188:/gluster/md3/sourceimages Clients connected : 17 Hostname BytesRead BytesWritten OpVersion -------- --------- ------------ --------- 192.168.0.188:49151 3707272 3387700 110000 192.168.0.189:49149 3346388 2264688 110000 192.168.0.190:49149 4132 6540 110000 ---------------------------------------------- Brick : gluster189:/gluster/md3/sourceimages Clients connected : 17 Hostname BytesRead BytesWritten OpVersion -------- --------- ------------ --------- 192.168.0.189:49151 3698464 3377496 110000 192.168.0.188:49146 3350768 2268260 110000 192.168.0.190:49148 4132 6540 110000 ---------------------------------------------- Brick : gluster190:/gluster/md3/sourceimages Clients connected : 15 Hostname BytesRead BytesWritten OpVersion -------- --------- ------------ --------- 192.168.0.190:49151 38692 49988 110000 ---------------------------------------------- The 2 good (peer) cluster are missing on the 3rd/bad server. As these are not normal clients: how do i re-add/re-connect them? The 3 servers do not mount the volume to some mountpoint during normal service. Best regards, Hubert
Ah, logs: nothing in the glustershd.log on the 3 gluster servers. But on one client in /var/log/glusterfs/data-sourceimages.log : [2024-04-23 06:54:21.456157 +0000] W [MSGID: 114061] [client-common.c:796:client_pre_lk_v2] 0-sourceimages-client-2: remote_fd is -1. EBADFD [{gfid=a1817071-2949-4145-a96a-874159e46511}, {errno=77}, {error=File descriptor in bad state}] [2024-04-23 06:54:21.456195 +0000] E [MSGID: 108028] [afr-open.c:361:afr_is_reopen_allowed_cbk] 0-sourceimages-replicate-0: Failed getlk for a1817071-2949-4145-a96a-874159e46511 [File descriptor in bad state] [2024-04-23 06:54:21.488511 +0000] W [MSGID: 114061] [client-common.c:530:client_pre_flush_v2] 0-sourceimages-client-2: remote_fd is -1. EBADFD [{gfid=a1817071-2949-4145-a96a-874159e46511}, {errno=77}, {error=File descriptor in bad stat e}] Am Di., 23. Apr. 2024 um 08:46 Uhr schrieb Hu Bert <revirii at googlemail.com>:> > Hi, > > referring to this thread: > https://lists.gluster.org/pipermail/gluster-users/2024-January/040465.html > especially: https://lists.gluster.org/pipermail/gluster-users/2024-January/040513.html > > I've updated+rebooted 3 servers (debian bookworm) with gluster 11.1 > running. The first 2 servers went fine, gluster volume ok, no heals, > so after a couple of minutes i rebooted the 3rd server. And having the > same problem again: heals are counting up, no heals happen. gluster > volume status+info ok, gluster peer status ok. > > Full volume status+info: https://pastebin.com/aEEEKn7h > > Volume Name: sourceimages > Type: Replicate > Volume ID: d6a559a1-ca4c-48c7-8adf-89048333bb58 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: gluster188:/gluster/md3/sourceimages > Brick2: gluster189:/gluster/md3/sourceimages > Brick3: gluster190:/gluster/md3/sourceimages > > Internal IPs: > gluster188: 192.168.0.188 > gluster189: 192.168.0.189 > gluster190: 192.168.0.190 > > After rebooting the 3rd server (gluster190) the client info looks like this: > > gluster volume status sourceimages clients > Client connections for volume sourceimages > ---------------------------------------------- > Brick : gluster188:/gluster/md3/sourceimages > Clients connected : 17 > Hostname BytesRead > BytesWritten OpVersion > -------- --------- > ------------ --------- > 192.168.0.188:49151 1047856 > 988364 110000 > 192.168.0.189:49149 930792 > 654096 110000 > 192.168.0.109:49147 271598 > 279908 110000 > 192.168.0.223:49147 126764 > 130964 110000 > 192.168.0.222:49146 125848 > 130144 110000 > 192.168.0.2:49147 273756 > 43400387 110000 > 192.168.0.15:49147 57248531 > 14327465 110000 > 192.168.0.126:49147 32282645 > 671284763 110000 > 192.168.0.94:49146 125520 > 128864 110000 > 192.168.0.66:49146 34086248 > 666519388 110000 > 192.168.0.99:49146 3051076 > 522652843 110000 > 192.168.0.16:49146 149773024 > 1049035 110000 > 192.168.0.110:49146 1574768 > 566124922 110000 > 192.168.0.106:49146 152640790 > 146483580 110000 > 192.168.0.91:49133 89548971 > 82709793 110000 > 192.168.0.190:49149 4132 > 6540 110000 > 192.168.0.118:49133 92176 > 92884 110000 > ---------------------------------------------- > Brick : gluster189:/gluster/md3/sourceimages > Clients connected : 17 > Hostname BytesRead > BytesWritten OpVersion > -------- --------- > ------------ --------- > 192.168.0.188:49146 935172 > 658268 110000 > 192.168.0.189:49151 1039048 > 977920 110000 > 192.168.0.126:49146 27106555 > 231766764 110000 > 192.168.0.110:49147 1121696 > 226426262 110000 > 192.168.0.16:49147 147165735 > 994015 110000 > 192.168.0.106:49147 152476618 > 1091156 110000 > 192.168.0.94:49147 109612 > 112688 110000 > 192.168.0.109:49146 180819 > 1489715 110000 > 192.168.0.223:49146 110708 > 114316 110000 > 192.168.0.99:49147 2573412 > 157737429 110000 > 192.168.0.2:49145 242696 > 26088710 110000 > 192.168.0.222:49145 109728 > 113064 110000 > 192.168.0.66:49145 27003740 > 215124678 110000 > 192.168.0.15:49145 57217513 > 594699 110000 > 192.168.0.91:49132 89463431 > 2714920 110000 > 192.168.0.190:49148 4132 > 6540 110000 > 192.168.0.118:49131 92380 > 94996 110000 > ---------------------------------------------- > Brick : gluster190:/gluster/md3/sourceimages > Clients connected : 2 > Hostname BytesRead > BytesWritten OpVersion > -------- --------- > ------------ --------- > 192.168.0.190:49151 21252 > 27988 110000 > 192.168.0.118:49132 92176 > 92884 110000 > > The bad server (gluster190) has only 2 clients: itself and > 192.168.0.118 (was rebooted after gluster190). Well, i remounted the > volume on the other clients (without reboot), they appear now - but > the most important thing: the other 2 gluster servers are missing. > Output shortened, removed the connected clients: > > gluster volume status sourceimages clients > Client connections for volume sourceimages > ---------------------------------------------- > Brick : gluster188:/gluster/md3/sourceimages > Clients connected : 17 > Hostname BytesRead > BytesWritten OpVersion > -------- --------- > ------------ --------- > 192.168.0.188:49151 3707272 > 3387700 110000 > 192.168.0.189:49149 3346388 > 2264688 110000 > 192.168.0.190:49149 4132 > 6540 110000 > ---------------------------------------------- > Brick : gluster189:/gluster/md3/sourceimages > Clients connected : 17 > Hostname BytesRead > BytesWritten OpVersion > -------- --------- > ------------ --------- > 192.168.0.189:49151 3698464 > 3377496 110000 > 192.168.0.188:49146 3350768 > 2268260 110000 > 192.168.0.190:49148 4132 > 6540 110000 > ---------------------------------------------- > Brick : gluster190:/gluster/md3/sourceimages > Clients connected : 15 > Hostname BytesRead > BytesWritten OpVersion > -------- --------- > ------------ --------- > 192.168.0.190:49151 38692 > 49988 110000 > ---------------------------------------------- > > The 2 good (peer) cluster are missing on the 3rd/bad server. As these > are not normal clients: how do i re-add/re-connect them? The 3 servers > do not mount the volume to some mountpoint during normal service. > > > Best regards, > Hubert
Howdy, was able to solve the problem. I had 2 options: reset-brick (i.e. reconfigure) or replace-brick (i.e. full sync). Tried reset-brick first... gluster volume reset-brick sourceimages gluster190:/gluster/md3/sourceimages start [... do nothing ...] gluster volume reset-brick sourceimages gluster190:/gluster/md3/sourceimages gluster190:/gluster/md3/sourceimages commit force After that the pending heals started, going to 0 pretty fast, and the connected clients are now identical for all 3 servers. Thx for reading, Hubert Am Di., 23. Apr. 2024 um 08:46 Uhr schrieb Hu Bert <revirii at googlemail.com>:> > Hi, > > referring to this thread: > https://lists.gluster.org/pipermail/gluster-users/2024-January/040465.html > especially: https://lists.gluster.org/pipermail/gluster-users/2024-January/040513.html > > I've updated+rebooted 3 servers (debian bookworm) with gluster 11.1 > running. The first 2 servers went fine, gluster volume ok, no heals, > so after a couple of minutes i rebooted the 3rd server. And having the > same problem again: heals are counting up, no heals happen. gluster > volume status+info ok, gluster peer status ok. > > Full volume status+info: https://pastebin.com/aEEEKn7h > > Volume Name: sourceimages > Type: Replicate > Volume ID: d6a559a1-ca4c-48c7-8adf-89048333bb58 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: gluster188:/gluster/md3/sourceimages > Brick2: gluster189:/gluster/md3/sourceimages > Brick3: gluster190:/gluster/md3/sourceimages > > Internal IPs: > gluster188: 192.168.0.188 > gluster189: 192.168.0.189 > gluster190: 192.168.0.190 > > After rebooting the 3rd server (gluster190) the client info looks like this: > > gluster volume status sourceimages clients > Client connections for volume sourceimages > ---------------------------------------------- > Brick : gluster188:/gluster/md3/sourceimages > Clients connected : 17 > Hostname BytesRead > BytesWritten OpVersion > -------- --------- > ------------ --------- > 192.168.0.188:49151 1047856 > 988364 110000 > 192.168.0.189:49149 930792 > 654096 110000 > 192.168.0.109:49147 271598 > 279908 110000 > 192.168.0.223:49147 126764 > 130964 110000 > 192.168.0.222:49146 125848 > 130144 110000 > 192.168.0.2:49147 273756 > 43400387 110000 > 192.168.0.15:49147 57248531 > 14327465 110000 > 192.168.0.126:49147 32282645 > 671284763 110000 > 192.168.0.94:49146 125520 > 128864 110000 > 192.168.0.66:49146 34086248 > 666519388 110000 > 192.168.0.99:49146 3051076 > 522652843 110000 > 192.168.0.16:49146 149773024 > 1049035 110000 > 192.168.0.110:49146 1574768 > 566124922 110000 > 192.168.0.106:49146 152640790 > 146483580 110000 > 192.168.0.91:49133 89548971 > 82709793 110000 > 192.168.0.190:49149 4132 > 6540 110000 > 192.168.0.118:49133 92176 > 92884 110000 > ---------------------------------------------- > Brick : gluster189:/gluster/md3/sourceimages > Clients connected : 17 > Hostname BytesRead > BytesWritten OpVersion > -------- --------- > ------------ --------- > 192.168.0.188:49146 935172 > 658268 110000 > 192.168.0.189:49151 1039048 > 977920 110000 > 192.168.0.126:49146 27106555 > 231766764 110000 > 192.168.0.110:49147 1121696 > 226426262 110000 > 192.168.0.16:49147 147165735 > 994015 110000 > 192.168.0.106:49147 152476618 > 1091156 110000 > 192.168.0.94:49147 109612 > 112688 110000 > 192.168.0.109:49146 180819 > 1489715 110000 > 192.168.0.223:49146 110708 > 114316 110000 > 192.168.0.99:49147 2573412 > 157737429 110000 > 192.168.0.2:49145 242696 > 26088710 110000 > 192.168.0.222:49145 109728 > 113064 110000 > 192.168.0.66:49145 27003740 > 215124678 110000 > 192.168.0.15:49145 57217513 > 594699 110000 > 192.168.0.91:49132 89463431 > 2714920 110000 > 192.168.0.190:49148 4132 > 6540 110000 > 192.168.0.118:49131 92380 > 94996 110000 > ---------------------------------------------- > Brick : gluster190:/gluster/md3/sourceimages > Clients connected : 2 > Hostname BytesRead > BytesWritten OpVersion > -------- --------- > ------------ --------- > 192.168.0.190:49151 21252 > 27988 110000 > 192.168.0.118:49132 92176 > 92884 110000 > > The bad server (gluster190) has only 2 clients: itself and > 192.168.0.118 (was rebooted after gluster190). Well, i remounted the > volume on the other clients (without reboot), they appear now - but > the most important thing: the other 2 gluster servers are missing. > Output shortened, removed the connected clients: > > gluster volume status sourceimages clients > Client connections for volume sourceimages > ---------------------------------------------- > Brick : gluster188:/gluster/md3/sourceimages > Clients connected : 17 > Hostname BytesRead > BytesWritten OpVersion > -------- --------- > ------------ --------- > 192.168.0.188:49151 3707272 > 3387700 110000 > 192.168.0.189:49149 3346388 > 2264688 110000 > 192.168.0.190:49149 4132 > 6540 110000 > ---------------------------------------------- > Brick : gluster189:/gluster/md3/sourceimages > Clients connected : 17 > Hostname BytesRead > BytesWritten OpVersion > -------- --------- > ------------ --------- > 192.168.0.189:49151 3698464 > 3377496 110000 > 192.168.0.188:49146 3350768 > 2268260 110000 > 192.168.0.190:49148 4132 > 6540 110000 > ---------------------------------------------- > Brick : gluster190:/gluster/md3/sourceimages > Clients connected : 15 > Hostname BytesRead > BytesWritten OpVersion > -------- --------- > ------------ --------- > 192.168.0.190:49151 38692 > 49988 110000 > ---------------------------------------------- > > The 2 good (peer) cluster are missing on the 3rd/bad server. As these > are not normal clients: how do i re-add/re-connect them? The 3 servers > do not mount the volume to some mountpoint during normal service. > > > Best regards, > Hubert