Alexander Iliev
2020-Apr-02 00:08 UTC
[Gluster-users] GlusterFS geo-replication progress question
Hi all, I have a running geo-replication session between two clusters and I'm trying to figure out what is the current progress of the replication and possibly how much longer it will take. It has been running for quite a while now (> 1 month), but the thing is that both the hardware of the nodes and the link between the two clusters aren't that great (e.g., the volumes are backed by rotating disks) and the volume is somewhat sizeable (30-ish TB) and given these details I'm not really sure how long it is supposed to take normally. I have several bricks in the volume (same brick size and physical layout in both clusters) that are now showing up with a Changelog Crawl status and with a recent LAST_SYNCED date in the `gluster colume geo-replication status detail` command output which seems to be the desired state for all bricks. The rest of the bricks though are in Hybrid Crawl state and have been in that state forever. So I suppose my questions are - how can I tell if the replication session is somehow broken and if it's not, then is there are way for me to find out the progress and the ETA of the replication? In /var/log/glusterfs/geo-replication/$session_dir/gsyncd.log there are some errors like: [2020-03-31 11:48:47.81269] E [syncdutils(worker /data/gfs/store1/8/brick):822:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsync d-aux-ssh-6aDWmc/206c4b2c3eb782ea2cf49ab5142bd68b.sock x.x.x.x /nonexistent/gsyncd slave <vol> x.x.x.x::<vol> --master-node x.x.x.x --master-node-id 9476b8bb-d7ee-489a-b083-875805343e67 --master-brick <brick_path> --local-node x.x.x.x 2 --local-node-id 426b564d-35d9-4291-980e-795903e9a386 --slave-timeout 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/sbin error=1 [2020-03-31 11:48:47.81617] E [syncdutils(worker <brick_path>):826:logerr] Popen: ssh> failed with ValueError. [2020-03-31 11:48:47.390397] I [repce(agent <brick_path>):97:service_loop] RepceServer: terminating on reaching EOF. In the brick logs I see stuff like: [2020-03-29 07:49:05.338947] E [fuse-bridge.c:4167:fuse_xattr_cbk] 0-glusterfs-fuse: extended attribute not supported by the backend storage I don't know if these are critical, from the rest of the logs it looks like data is traveling between the clusters. Any help will be greatly appreciated. Thank you in advance! Best regards, -- alexander iliev
Sunny Kumar
2020-Apr-06 22:25 UTC
[Gluster-users] GlusterFS geo-replication progress question
Hi Alexander, Answers inline below: On Thu, Apr 2, 2020 at 1:08 AM Alexander Iliev <ailiev+gluster at mamul.org> wrote:> > Hi all, > > I have a running geo-replication session between two clusters and I'm > trying to figure out what is the current progress of the replication and > possibly how much longer it will take. > > It has been running for quite a while now (> 1 month), but the thing is > that both the hardware of the nodes and the link between the two > clusters aren't that great (e.g., the volumes are backed by rotating > disks) and the volume is somewhat sizeable (30-ish TB) and given these > details I'm not really sure how long it is supposed to take normally. > > I have several bricks in the volume (same brick size and physical layout > in both clusters) that are now showing up with a Changelog Crawl status > and with a recent LAST_SYNCED date in the `gluster colume > geo-replication status detail` command output which seems to be the > desired state for all bricks. The rest of the bricks though are in > Hybrid Crawl state and have been in that state forever. > > So I suppose my questions are - how can I tell if the replication > session is somehow broken and if it's not, then is there are way for me > to find out the progress and the ETA of the replication? >Please go through this section[1] which talks about this. In Hybrid crawl at present we do not have any accounting information like how much time it will take to sync data.> In /var/log/glusterfs/geo-replication/$session_dir/gsyncd.log there are > some errors like: > > [2020-03-31 11:48:47.81269] E [syncdutils(worker > /data/gfs/store1/8/brick):822:errlog] Popen: command returned error > cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i > /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto > -S /tmp/gsync > d-aux-ssh-6aDWmc/206c4b2c3eb782ea2cf49ab5142bd68b.sock x.x.x.x > /nonexistent/gsyncd slave <vol> x.x.x.x::<vol> --master-node x.x.x.x > --master-node-id 9476b8bb-d7ee-489a-b083-875805343e67 --master-brick > <brick_path> --local-node x.x.x.x > 2 --local-node-id 426b564d-35d9-4291-980e-795903e9a386 --slave-timeout > 120 --slave-log-level INFO --slave-gluster-log-level INFO > --slave-gluster-command-dir /usr/sbin error=1 > [2020-03-31 11:48:47.81617] E [syncdutils(worker > <brick_path>):826:logerr] Popen: ssh> failed with ValueError. > [2020-03-31 11:48:47.390397] I [repce(agent > <brick_path>):97:service_loop] RepceServer: terminating on reaching EOF. >If you are seeing this error at a regular interval then please check your ssh connection, it might have broken. If possible please share full traceback form both master and slave to debug the issue.> In the brick logs I see stuff like: > > [2020-03-29 07:49:05.338947] E [fuse-bridge.c:4167:fuse_xattr_cbk] > 0-glusterfs-fuse: extended attribute not supported by the backend storage > > I don't know if these are critical, from the rest of the logs it looks > like data is traveling between the clusters. > > Any help will be greatly appreciated. Thank you in advance! > > Best regards, > -- > alexander iliev > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >[1]. https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/#status /sunny