Erik Jacobson
2019-Nov-03 20:46 UTC
[Gluster-users] hook script question related to ctdb, shared storage, and bind mounts
So, I have a solution I have written about in the based that is based on gluster with CTDB for IP and a level of redundancy. It's been working fine except for a few quirks I need to work out on giant clusters when I get access. I have 3x9 gluster volume, each are also NFS servers, using gluster NFS (ganesha isn't reliable for my workload yet). There are 9 IP aliases spread across 9 servers. I also have many bind mounts that point to the shared storage as a source, and the /gluster/lock volume ("ctdb") of course. glusterfs 4.1.6 (rhel8 today, but I use rhel7, rhel8, sles12, and sles15) Things work well when everything is up and running. IP failover works well when one of the servers goes down. My issue is when that server comes back up. Despite my best efforts with systemd fstab dependencies, the shared storage areas including the gluster lock for CTDB do not always get mounted before CTDB starts. This causes trouble for CTDB correctly joining the collective. I also have problems where my bind mounts can happen before the shared storage is mounted, despite my attempts at preventing this with dependencies in fstab. I decided a better approach would be to use a gluster hook and just mount everything I need as I need it, and start up ctdb when I know and verify that /gluster/lock is really gluster and not a local disk. I started down a road of doing this with a start host hook and after spending a while at it, I realized my logic error. This will only fire when the volume is *started*, not when a server that was down re-joins. I took a look at the code, glusterd-hooks.c, and found that support for "brick start" is not in place for a hook script but it's nearly there: [GD_OP_START_BRICK] = EMPTY, ... and no entry in glusterd_hooks_add_op_args() yet. Before I make a patch for my own use, I wanted to do a sanity check and find out if others have solved this better than the road I'm heading down. What I was thinking of doing is enabling a brick start hook, and do my processing for volumes being mounted from there. However, I suppose brick start is a bad choice for the case of simply stopping and starting the volume, because my processing would try to complete before the gluster volume was fully started. It would probably work for a brick "coming back and joining" but not "stop volume/start volume". Any suggestions? My end goal is: - mount shared storage every boot - only attempt to mount when gluster is available (_netdev doesn't seem to be enough) - never start ctdb unless /gluster/lock is a shared storage and not a directory. - only do my bind mounts from shared storage in to the rest of the layout when we are sure the shared storage is mounted (don't bind-mount using an empty directory as a source by accident!) Thanks so much for reading my question, Erik