Stefan Förster
2016-Nov-17 11:29 UTC
[Gluster-users] gsyncd_template.conf: Permission denied during geo-replication
Hello world, I'm currently setting up a few Vagrant boxes to test out geo-replication. I have three boxes in two locations, fra-node[1-3] and chi-node[1-3]. I set up a single replicated volume per cluster (fra-volume and chi-volume) and then initiate a geo-replication from fra to chi. After setting up the mountbroker on one of the slave nodes, when restaring glusterd, it re-creates the file gsyncd_template.conf with root:root ownership: #v+ [root at chi-node2 ~]# chgrp geogroup /var/lib/glusterd/geo-replication/gsyncd_template.conf [root at chi-node2 ~]# chmod g+rw !$ chmod g+rw /var/lib/glusterd/geo-replication/gsyncd_template.conf [root at chi-node2 ~]# ls -l !$ ls -l /var/lib/glusterd/geo-replication/gsyncd_template.conf -rwxrwxr-x 1 root geogroup 1858 Nov 17 10:28 /var/lib/glusterd/geo-replication/gsyncd_template.conf [root at chi-node2 ~]# service glusterd restart Stopping glusterd: [ OK ] Starting glusterd: [ OK ] [root at chi-node2 ~]# ls -l /var/lib/glusterd/geo-replication/gsyncd_template.conf -rwxr-xr-x 1 root root 1858 Nov 17 12:26 /var/lib/glusterd/geo-replication/gsyncd_template.conf #v- And geo-replication fails then fails: #v+ Popen: ssh> [2016-11-17 09:33:56.824232] I [cli.c:730:main] 0-cli: Started running /usr/sbin/gluster with version 3.8.5 Popen: ssh> [2016-11-17 09:33:56.824321] I [cli.c:613:cli_rpc_init] 0-cli: Connecting to remote glusterd at localhost Popen: ssh> [2016-11-17 09:33:56.907702] I [MSGID: 101190] [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 Popen: ssh> [2016-11-17 09:33:56.908634] I [socket.c:2403:socket_event_handler] 0-transport: disconnecting now Popen: ssh> [2016-11-17 09:33:56.908747] I [cli-rpc-ops.c:6655:gf_cli_getwd_cbk] 0-cli: Received resp to getwd Popen: ssh> [2016-11-17 09:33:56.908833] I [input.c:31:cli_batch] 0-: Exiting with: 0 Popen: ssh> [2016-11-17 09:33:56.985793] E [syncdutils:279:log_raise_exception] <top>: FAIL: Popen: ssh> Traceback (most recent call last): Popen: ssh> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 203, in main Popen: ssh> main_i() Popen: ssh> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 542, in main_i Popen: ssh> upgrade_config_file(rconf['config_file'], confdata) Popen: ssh> File "/usr/libexec/glusterfs/python/syncdaemon/configinterface.py", line 158, in upgrade_config_file Popen: ssh> shutil.move(tempConfigFile.name, path) Popen: ssh> File "/usr/lib64/python2.6/shutil.py", line 260, in move Popen: ssh> copy2(src, real_dst) Popen: ssh> File "/usr/lib64/python2.6/shutil.py", line 95, in copy2 Popen: ssh> copyfile(src, dst) Popen: ssh> File "/usr/lib64/python2.6/shutil.py", line 51, in copyfile Popen: ssh> with open(dst, 'wb') as fdst: Popen: ssh> IOError: [Errno 13] Permission denied: '/var/lib/glusterd/geo-replication/gsyncd_template.conf' Popen: ssh> failed with IOError. #v- This seems related to: https://bugzilla.redhat.com/show_bug.cgi?id=1339683 Involved software versions: glusterfs-3.8.5-1.el6.x86_64 glusterfs-api-3.8.5-1.el6.x86_64 glusterfs-cli-3.8.5-1.el6.x86_64 glusterfs-client-xlators-3.8.5-1.el6.x86_64 glusterfs-fuse-3.8.5-1.el6.x86_64 glusterfs-geo-replication-3.8.5-1.el6.x86_64 glusterfs-libs-3.8.5-1.el6.x86_64 glusterfs-server-3.8.5-1.el6.x86_64 Steps leading up to this: 1. On all nodes (I probably don't need to setup all this on all nodes, but it shouldn't hurt, and make it easier if I ever (I probably don't need to setup all this on all nodes, but it shouldn't hurt, and make it easier if I ever want to switch geoip replication direction): # install packages yum -q -y install glusterfs glusterfs-fuse glusterfs-server \ glusterfs-geo-replication # add geoaccount:geogroup groupadd -r geogroup useradd -r -g geogroup -s /bin/bash -m geoaccount mkdir /home/geoaccount/.ssh # setup passwordless SSH cp /vagrant/geoaccount_key.pub /home/geoaccount/.ssh/authorized_keys chmod -R go-rwx /home/geoaccount/.ssh chown -R geoaccount. /home/geoaccount/.ssh mkdir /root/.ssh cp /vagrant/geoaccount_key /root/.ssh/id_rsa chmod -R go-rwx /root/.ssh # add mountbroker directory mkdir /var/mountbroker-root chmod 0711 /var/mountbroker-root # geo-replication directories and permissions mkdir -p /var/log/glusterfs/geo-replication-slaves /var/lib/glusterd/geo-replication chgrp -R geogroup /var/log/glusterfs/geo-replication-slaves chgrp -R geogroup /var/lib/glusterd/geo-replication chmod -R 770 /var/lib/glusterd/geo-replication chmod -R 770 /var/log/glusterfs/geo-replication-slaves 2. On the fra-node3 and chi-node3 servers ($area is either "fra" or "chi"): gluster peer probe ${area}-node1 gluster peer probe ${area}-node2 gluster volume create ${area}-volume replica $count transport tcp $nodes gluster volume start ${area}-volume gluster volume set all cluster.enable-shared-storage enable 3. On the chi-node3 server: gluster system:: execute mountbroker opt mountbroker-root /var/mountbroker-root gluster system:: execute mountbroker user geoaccount ${area}-volume gluster system:: execute mountbroker opt geo-replication-log-group geogroup gluster system:: execute mountbroker opt rpc-auth-allow-insecure on 4. On all chi-node* servers (since those are replication targets), after settting up the mount broker: service glusterd restart 5. On fra-node3: gluster system:: execute gsec_create gluster volume geo-replication fra-volume geoaccount at chi-node3::chi-volume create push-pem gluster volume geo-replication fra-volume geoaccount at chi-node3::chi-volume config use_meta_volume true gluster volume geo-replication fra-volume geoaccount at chi-node3::chi-volume start 6. On chi-node3: /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh geoaccount fra-volume chi-volume Volume an geo-replication status: $ vagrant ssh fra-node1 -c 'sudo gluster volume status fra-volume' ; \ vagrant ssh chi-node1 -c 'sudo gluster volume status chi-volume'; \ vagrant ssh fra-node1 -c 'sudo gluster volume geo-replication status' Status of volume: fra-volume Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick fra-node1:/data1/glusterfs 49152 0 Y 4499 Brick fra-node2:/data1/glusterfs 49152 0 Y 4434 Brick fra-node3:/data1/glusterfs 49152 0 Y 3753 Self-heal Daemon on localhost N/A N/A Y 5091 Self-heal Daemon on fra-node3.vagrant.finan cial.com N/A N/A Y 4623 Self-heal Daemon on fra-node2 N/A N/A Y 5024 Task Status of Volume fra-volume ------------------------------------------------------------------------------ There are no active volume tasks Connection to 127.0.0.1 closed. Status of volume: chi-volume Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick chi-node1:/data1/glusterfs 49152 0 Y 4498 Brick chi-node2:/data1/glusterfs 49152 0 Y 4435 Brick chi-node3:/data1/glusterfs 49152 0 Y 3750 Self-heal Daemon on localhost N/A N/A Y 5756 Self-heal Daemon on chi-node3.vagrant.finan cial.com N/A N/A Y 5900 Self-heal Daemon on chi-node2 N/A N/A Y 5667 Task Status of Volume chi-volume ------------------------------------------------------------------------------ There are no active volume tasks Connection to 127.0.0.1 closed. MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ----------------------------------------------------------------------------------------------------------------------------------------------------------- fra-node1 fra-volume /data1/glusterfs geoaccount ssh://geoaccount at chi-node3::chi-volume N/A Faulty N/A N/A fra-node3 fra-volume /data1/glusterfs geoaccount ssh://geoaccount at chi-node3::chi-volume N/A Faulty N/A N/A fra-node2 fra-volume /data1/glusterfs geoaccount ssh://geoaccount at chi-node3::chi-volume N/A Faulty N/A N/A How can I convince glusterd to create that template with correct permissions? Cheers, Stefan