Olaf Buitelaar
2020-Nov-02 15:05 UTC
[Gluster-users] Self-Heal Daemon not starting after upgrade 6.10 to 7.8
Dear Gluster users, I'm trying to upgrade from gluster 6.10 to 7.8, i've currently tried this on 2 hosts, but on both the Self-Heal Daemon refuses to start. It could be because not all not are updated yet, but i'm a bit hesitant to continue, without the Self-Heal Daemon running. I'm not using quata's and i'm not seeing the peer reject messages, as other users reported in the mailing list. In fact gluster peer status and gluster pool list, display all nodes as connected. Also gluster v heal <vol> info shows all nodes as Status: connected, however some report pending heals, which don't really seem to progress. Only in gluster v status <vol> the 2 upgraded nodes report not running; Self-heal Daemon on localhost N/A N/A N N/A Self-heal Daemon on 10.32.9.5 N/A N/A Y 24022 Self-heal Daemon on 10.201.0.4 N/A N/A Y 26704 Self-heal Daemon on 10.201.0.3 N/A N/A N N/A Self-heal Daemon on 10.32.9.4 N/A N/A Y 46294 Self-heal Daemon on 10.32.9.3 N/A N/A Y 22194 Self-heal Daemon on 10.201.0.9 N/A N/A Y 14902 Self-heal Daemon on 10.201.0.6 N/A N/A Y 5358 Self-heal Daemon on 10.201.0.5 N/A N/A Y 28073 Self-heal Daemon on 10.201.0.7 N/A N/A Y 15385 Self-heal Daemon on 10.201.0.1 N/A N/A Y 8917 Self-heal Daemon on 10.201.0.12 N/A N/A Y 56796 Self-heal Daemon on 10.201.0.8 N/A N/A Y 7990 Self-heal Daemon on 10.201.0.11 N/A N/A Y 68223 Self-heal Daemon on 10.201.0.10 N/A N/A Y 20828 After the upgrade i see the file /var/lib/glusterd/vols/<vol>/<vol>-shd.vol being created, which doesn't exists on the 6.10 nodes. in the logs i see these relevant messages; log: glusterd.log 0-management: Regenerating volfiles due to a max op-version mismatch or glusterd.upgrade file not being present, op_version retrieved:60000, max op_version: 70200 [2020-10-31 21:48:42.256193] W [MSGID: 106204] [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown key: tier-enabled [2020-10-31 21:48:42.256232] W [MSGID: 106204] [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown key: brick-0 [2020-10-31 21:48:42.256240] W [MSGID: 106204] [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown key: brick-1 [2020-10-31 21:48:42.256246] W [MSGID: 106204] [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown key: brick-2 [2020-10-31 21:48:42.256251] W [MSGID: 106204] [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown key: brick-3 [2020-10-31 21:48:42.256256] W [MSGID: 106204] [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown key: brick-4 [2020-10-31 21:48:42.256261] W [MSGID: 106204] [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown key: brick-5 [2020-10-31 21:48:42.256266] W [MSGID: 106204] [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown key: brick-6 [2020-10-31 21:48:42.256271] W [MSGID: 106204] [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown key: brick-7 [2020-10-31 21:48:42.256276] W [MSGID: 106204] [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown key: brick-8 [2020-10-31 21:51:36.049009] W [MSGID: 106617] [glusterd-svc-helper.c:948:glusterd_attach_svc] 0-glusterd: attach failed for glustershd(volume=backups) [2020-10-31 21:51:36.049055] E [MSGID: 106048] [glusterd-shd-svc.c:482:glusterd_shdsvc_start] 0-glusterd: Failed to attach shd svc(volume=backups) to pid=9262 [2020-10-31 21:51:36.049138] E [MSGID: 106615] [glusterd-shd-svc.c:638:glusterd_shdsvc_restart] 0-management: Couldn't start shd for vol: backups on restart [2020-10-31 21:51:36.183133] I [MSGID: 106618] [glusterd-svc-helper.c:901:glusterd_attach_svc] 0-glusterd: adding svc glustershd (volume=backups) to existing process with pid 9262 log: glustershd.log [2020-10-31 21:49:55.976120] I [MSGID: 100041] [glusterfsd-mgmt.c:1111:glusterfs_handle_svc_attach] 0-glusterfs: received attach request for volfile-id=shd/backups [2020-10-31 21:49:55.976136] W [MSGID: 100042] [glusterfsd-mgmt.c:1137:glusterfs_handle_svc_attach] 0-glusterfs: got attach for shd/backups but no active graph [Invalid argument] So i suspect something in the logic for the self-heal daemon has changed, since it has the new *.vol configuration for the shd. Question is, is this just a transitional state, till all nodes are upgraded. And thus safe to continue the update. Or is this something that should be fixed, and if so, any clues how? Thanks Olaf -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20201102/626fdf83/attachment.html>
Ravishankar N
2020-Nov-03 11:17 UTC
[Gluster-users] Self-Heal Daemon not starting after upgrade 6.10 to 7.8
On 02/11/20 8:35 pm, Olaf Buitelaar wrote:> Dear Gluster users, > > I'm trying to upgrade from gluster 6.10 to 7.8, i've currently tried > this on 2 hosts, but on both the Self-Heal Daemon refuses to start. > It could be because not all not are updated yet, but i'm a bit > hesitant to continue, without the Self-Heal Daemon running. > I'm not using quata's and i'm not seeing the peer reject messages, as > other users reported in the mailing list. > In fact gluster peer status and gluster pool list, display all nodes > as connected. > Also gluster v heal <vol> info shows all nodes as Status: connected, > however some report pending heals, which don't really seem to progress. > Only in gluster v status <vol> the 2 upgraded nodes report not running; > > Self-heal Daemon on localhost ? ? ? ? ? ? ? N/A ? ? ? N/A ? ? ?N ? ? ? N/A > Self-heal Daemon on 10.32.9.5 ? ? ? ? ? ? ? N/A ? ? ? N/A ? ?Y ? ? ? 24022 > Self-heal Daemon on 10.201.0.4 ? ? ? ? ? ? ?N/A ? ? ? N/A ? ?Y ? ? ? 26704 > Self-heal Daemon on 10.201.0.3 ? ? ? ? ? ? ?N/A ? ? ? N/A ? ?N ? ? ? N/A > Self-heal Daemon on 10.32.9.4 ? ? ? ? ? ? ? N/A ? ? ? N/A ? ?Y ? ? ? 46294 > Self-heal Daemon on 10.32.9.3 ? ? ? ? ? ? ? N/A ? ? ? N/A ? ?Y ? ? ? 22194 > Self-heal Daemon on 10.201.0.9 ? ? ? ? ? ? ?N/A ? ? ? N/A ? ?Y ? ? ? 14902 > Self-heal Daemon on 10.201.0.6 ? ? ? ? ? ? ?N/A ? ? ? N/A ? ?Y ? ? ? 5358 > Self-heal Daemon on 10.201.0.5 ? ? ? ? ? ? ?N/A ? ? ? N/A ? ?Y ? ? ? 28073 > Self-heal Daemon on 10.201.0.7 ? ? ? ? ? ? ?N/A ? ? ? N/A ? ?Y ? ? ? 15385 > Self-heal Daemon on 10.201.0.1 ? ? ? ? ? ? ?N/A ? ? ? N/A ? ?Y ? ? ? 8917 > Self-heal Daemon on 10.201.0.12 ? ? ? ? ? ? N/A ? ? ? N/A ? ?Y ? ? ? 56796 > Self-heal Daemon on 10.201.0.8 ? ? ? ? ? ? ?N/A ? ? ? N/A ? ?Y ? ? ? 7990 > Self-heal Daemon on 10.201.0.11 ? ? ? ? ? ? N/A ? ? ? N/A ? ?Y ? ? ? 68223 > Self-heal Daemon on 10.201.0.10 ? ? ? ? ? ? N/A ? ? ? N/A ? ?Y ? ? ? 20828 > > After the upgrade i see the > file?/var/lib/glusterd/vols/<vol>/<vol>-shd.vol being created, which > doesn't exists on the 6.10 nodes. > > in the logs i see these relevant messages; > log: glusterd.log > 0-management: Regenerating volfiles due to a max op-version mismatch > or glusterd.upgrade file not being present, op_version > retrieved:60000, max op_version: 70200I think this is because of the shd multiplex (https://bugzilla.redhat.com/show_bug.cgi?id=1659708) added by Rafi. Rafi, is there any workaround which can work for rolling upgrades? Or should we just do an offline upgrade of all server nodes for the shd to come online? -Ravi> > [2020-10-31 21:48:42.256193] W [MSGID: 106204] > [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: > Unknown key: tier-enabled > [2020-10-31 21:48:42.256232] W [MSGID: 106204] > [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: > Unknown key: brick-0 > [2020-10-31 21:48:42.256240] W [MSGID: 106204] > [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: > Unknown key: brick-1 > [2020-10-31 21:48:42.256246] W [MSGID: 106204] > [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: > Unknown key: brick-2 > [2020-10-31 21:48:42.256251] W [MSGID: 106204] > [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: > Unknown key: brick-3 > [2020-10-31 21:48:42.256256] W [MSGID: 106204] > [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: > Unknown key: brick-4 > [2020-10-31 21:48:42.256261] W [MSGID: 106204] > [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: > Unknown key: brick-5 > [2020-10-31 21:48:42.256266] W [MSGID: 106204] > [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: > Unknown key: brick-6 > [2020-10-31 21:48:42.256271] W [MSGID: 106204] > [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: > Unknown key: brick-7 > [2020-10-31 21:48:42.256276] W [MSGID: 106204] > [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: > Unknown key: brick-8 > > [2020-10-31 21:51:36.049009] W [MSGID: 106617] > [glusterd-svc-helper.c:948:glusterd_attach_svc] 0-glusterd: attach > failed for glustershd(volume=backups) > [2020-10-31 21:51:36.049055] E [MSGID: 106048] > [glusterd-shd-svc.c:482:glusterd_shdsvc_start] 0-glusterd: Failed to > attach shd svc(volume=backups) to pid=9262 > [2020-10-31 21:51:36.049138] E [MSGID: 106615] > [glusterd-shd-svc.c:638:glusterd_shdsvc_restart] 0-management: > Couldn't start shd for vol: backups on restart > [2020-10-31 21:51:36.183133] I [MSGID: 106618] > [glusterd-svc-helper.c:901:glusterd_attach_svc] 0-glusterd: adding svc > glustershd (volume=backups) to existing process with pid 9262 > > log:?glustershd.log > > [2020-10-31 21:49:55.976120] I [MSGID: 100041] > [glusterfsd-mgmt.c:1111:glusterfs_handle_svc_attach] 0-glusterfs: > received attach request for volfile-id=shd/backups > [2020-10-31 21:49:55.976136] W [MSGID: 100042] > [glusterfsd-mgmt.c:1137:glusterfs_handle_svc_attach] 0-glusterfs: got > attach for shd/backups but no active graph [Invalid argument] > > So i suspect something in the logic for the self-heal daemon has > changed,?since it?has the new *.vol configuration for the shd. > Question is, is this just a transitional state, till all nodes are > upgraded. And thus safe to continue the update. Or is this something > that should be fixed,?and if so, any clues how? > > Thanks Olaf > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20201103/4c99013e/attachment.html>