David Gibbons
2015-May-05 12:20 UTC
[Gluster-users] Geo-Rep. 3.5.3, Missing Files, Incorrect "Files Pending"
Thank you, responses and further questions inline below.> In master nodes, look for log messages. Let us know if you feel any issue > in log messages. (/var/log/glusterfs/geo-replication/) >The workers have been transitioning between active and faulty. They will throw an error in the log (I believe it's related to rsync error 23 or something, but will have to isolate it again), then switch to faulty. A minute or so later they are back to Active. Ideally, after initial crawl geo-rep should switch to Changelog crawl.>Thanks for clarifying, I will wait and look for that. It appears that xsync is the default, but I did change it to changelog yesterday. Which is a more reliable option?> -------------------------------------------------------------------------- > Geo-rep doesn't have persistent store of all path names and sync status. > When geo-rep gets the list of files to be synced, it adds the number to the > counter. But if the same files modified again the counter will be > incremented again. Numbers in Status output will not match the number of > files on disk. >When does it get reset back to 0? Or where are the 8191 files that it thinks are out of sync stored? I would like to be able to sanity check the progress. Thanks, Dave -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150505/9dd92f1a/attachment.html>
David Gibbons
2015-May-05 13:27 UTC
[Gluster-users] Geo-Rep. 3.5.3, Missing Files, Incorrect "Files Pending"
I caught one of the nodes transitioning into faulty mode, log output is below.> In master nodes, look for log messages. Let us know if you feel any issue > in log messages. (/var/log/glusterfs/geo-replication/) >When one of the nodes drops into "faulty", which happens periodically, this is the type of output that appears in the log: [root at gfs-a-1 ~]# tail /usr/local/var/log/glusterfs/geo-replication/shares/ssh%3A%2F%2Froot%4010.XX.XXX.X%3Agluster%3A%2F%2F127.0.0.1%3Abkpshares.log [2015-05-05 09:22:58.140913] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync: .gfid/065c09f9-4502-4a2c-81fa-5e8fcaf22712 [errcode: 23] [2015-05-05 09:22:58.152951] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync: .gfid/28a237a4-4346-48c5-bd1c-713273f591c7 [errcode: 23] [2015-05-05 09:22:58.327603] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync: .gfid/5755db3e-e9d8-42d2-b415-890842b086ae [errcode: 23] [2015-05-05 09:22:58.336714] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync: .gfid/0b7fc219-1e31-4e66-865f-5ae1c26d5e54 [errcode: 23] [2015-05-05 09:22:58.360308] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync: .gfid/955cd0e4-dd06-4db6-9391-34dbf72c9b06 [errcode: 23] [2015-05-05 09:22:58.367522] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync: .gfid/1d455725-c3e1-4111-92e5-335610d3f513 [errcode: 23] [2015-05-05 09:22:58.368226] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync: .gfid/7ce881ae-3491-4e21-b38b-0a27fb620c74 [errcode: 23] [2015-05-05 09:22:58.368959] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync: .gfid/056732c1-1537-4925-a30c-b905c110a5b2 [errcode: 23] [2015-05-05 09:22:58.369635] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync: .gfid/8c58d6c5-9975-43c6-8f4c-2a92337f7350 [errcode: 23] [2015-05-05 09:22:58.369790] W [master(/mnt/a-1-shares-brick-2/brick):877:process] _GMaster: incomplete sync, retrying changelogs: XSYNC-CHANGELOG.1430830891 When the node is in "active" mode, I get a lot of log output that resembles this: [2015-05-05 09:23:54.735502] W [master(/mnt/a-1-shares-brick-3/brick):877:process] _GMaster: incomplete sync, retrying changelogs: XSYNC-CHANGELOG.1430832227 [2015-05-05 09:23:55.449265] W [master(/mnt/a-1-shares-brick-3/brick):250:regjob] <top>: Rsync: .gfid/0665be16-04e9-4cbe-a2c9-a633caa8c79d [errcode: 23] [2015-05-05 09:23:55.449491] W [master(/mnt/a-1-shares-brick-3/brick):877:process] _GMaster: incomplete sync, retrying changelogs: XSYNC-CHANGELOG.1430832227 [2015-05-05 09:23:56.277033] W [master(/mnt/a-1-shares-brick-3/brick):250:regjob] <top>: Rsync: .gfid/0665be16-04e9-4cbe-a2c9-a633caa8c79d [errcode: 23] [2015-05-05 09:23:56.277259] W [master(/mnt/a-1-shares-brick-3/brick):860:process] _GMaster: changelogs XSYNC-CHANGELOG.1430832227 could not be processed - moving on... [2015-05-05 09:23:56.294038] W [master(/mnt/a-1-shares-brick-3/brick):862:process] _GMaster: SKIPPED GFID [2015-05-05 09:23:56.381592] I [master(/mnt/a-1-shares-brick-3/brick):1130:crawl] _GMaster: finished hybrid crawl syncing [2015-05-05 09:24:24.404884] I [master(/mnt/a-1-shares-brick-4/brick):445:crawlwrap] _GMaster: 1 crawls, 1 turns [2015-05-05 09:24:24.437452] I [master(/mnt/a-1-shares-brick-4/brick):1124:crawl] _GMaster: starting hybrid crawl... [2015-05-05 09:24:24.588865] I [master(/mnt/a-1-shares-brick-1/brick):1133:crawl] _GMaster: processing xsync changelog /usr/local/var/run/gluster/shares/ssh%3A%2F%2Froot%4010.XX.XXX.X%3Agluster%3A%2F%2F127.0.0.1%3Abkpshares/9d9a72f468c582609e97e8929e58b9ff/xsync/XSYNC-CHANGELOG.1430832135 This begs a couple of questions for me: 1. Are these errcode:23 issues files that have been deleted/renamed since the changelog was created? 2. Is it correct/expected for the node to drop into faulty and then recover itself to active periodically? Thank you again for your assistance! Dave -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150505/0968c0ec/attachment.html>