thr3ads.net - Gluster users - [Gluster-users] Geo-Rep. 3.5.3, Missing Files, Incorrect "Files Pending" [May 2015]

If this information is useful, please help other people find it:
Share via:

David Gibbons

2015-May-05 12:20 UTC

[Gluster-users] Geo-Rep. 3.5.3, Missing Files, Incorrect "Files Pending"

Thank you, responses and further questions inline below.

> In master nodes, look for log messages. Let us know if you feel any issue
> in log messages. (/var/log/glusterfs/geo-replication/)
>
The workers have been transitioning between active and faulty. They will
throw an error in the log (I believe it's related to rsync error 23 or
something, but will have to isolate it again), then switch to faulty. A
minute or so later they are back to Active.

Ideally, after initial crawl geo-rep should switch to Changelog
crawl.>
Thanks for clarifying, I will wait and look for that. It appears that xsync
is the default, but I did change it to changelog yesterday. Which is a more
reliable option?

> --------------------------------------------------------------------------
> Geo-rep doesn't have persistent store of all path names and sync
status.
> When geo-rep gets the list of files to be synced, it adds the number to the
> counter. But if the same files modified again the counter will be
> incremented again. Numbers in Status output will not match the number of
> files on disk.
>
When does it get reset back to 0? Or where are the 8191 files that it
thinks are out of sync stored? I would like to be able to sanity check the
progress.

Thanks,
Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150505/9dd92f1a/attachment.html>

David Gibbons

2015-May-05 13:27 UTC

head link

[Gluster-users] Geo-Rep. 3.5.3, Missing Files, Incorrect "Files Pending"

I caught one of the nodes transitioning into faulty mode, log output is
below.

>  In master nodes, look for log messages. Let us know if you feel any issue
> in log messages. (/var/log/glusterfs/geo-replication/)
>When one of the nodes drops into "faulty", which happens periodically,
this
is the type of output that appears in the log:

[root at gfs-a-1 ~]# tail
 /usr/local/var/log/glusterfs/geo-replication/shares/ssh%3A%2F%2Froot%4010.XX.XXX.X%3Agluster%3A%2F%2F127.0.0.1%3Abkpshares.log
[2015-05-05 09:22:58.140913] W
[master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync:
.gfid/065c09f9-4502-4a2c-81fa-5e8fcaf22712 [errcode: 23]
[2015-05-05 09:22:58.152951] W
[master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync:
.gfid/28a237a4-4346-48c5-bd1c-713273f591c7 [errcode: 23]
[2015-05-05 09:22:58.327603] W
[master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync:
.gfid/5755db3e-e9d8-42d2-b415-890842b086ae [errcode: 23]
[2015-05-05 09:22:58.336714] W
[master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync:
.gfid/0b7fc219-1e31-4e66-865f-5ae1c26d5e54 [errcode: 23]
[2015-05-05 09:22:58.360308] W
[master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync:
.gfid/955cd0e4-dd06-4db6-9391-34dbf72c9b06 [errcode: 23]
[2015-05-05 09:22:58.367522] W
[master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync:
.gfid/1d455725-c3e1-4111-92e5-335610d3f513 [errcode: 23]
[2015-05-05 09:22:58.368226] W
[master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync:
.gfid/7ce881ae-3491-4e21-b38b-0a27fb620c74 [errcode: 23]
[2015-05-05 09:22:58.368959] W
[master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync:
.gfid/056732c1-1537-4925-a30c-b905c110a5b2 [errcode: 23]
[2015-05-05 09:22:58.369635] W
[master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync:
.gfid/8c58d6c5-9975-43c6-8f4c-2a92337f7350 [errcode: 23]
[2015-05-05 09:22:58.369790] W
[master(/mnt/a-1-shares-brick-2/brick):877:process] _GMaster: incomplete
sync, retrying changelogs: XSYNC-CHANGELOG.1430830891

When the node is in "active" mode, I get a lot of log output that
resembles
this:
[2015-05-05 09:23:54.735502] W
[master(/mnt/a-1-shares-brick-3/brick):877:process] _GMaster: incomplete
sync, retrying changelogs: XSYNC-CHANGELOG.1430832227
[2015-05-05 09:23:55.449265] W
[master(/mnt/a-1-shares-brick-3/brick):250:regjob] <top>: Rsync:
.gfid/0665be16-04e9-4cbe-a2c9-a633caa8c79d [errcode: 23]
[2015-05-05 09:23:55.449491] W
[master(/mnt/a-1-shares-brick-3/brick):877:process] _GMaster: incomplete
sync, retrying changelogs: XSYNC-CHANGELOG.1430832227
[2015-05-05 09:23:56.277033] W
[master(/mnt/a-1-shares-brick-3/brick):250:regjob] <top>: Rsync:
.gfid/0665be16-04e9-4cbe-a2c9-a633caa8c79d [errcode: 23]
[2015-05-05 09:23:56.277259] W
[master(/mnt/a-1-shares-brick-3/brick):860:process] _GMaster: changelogs
XSYNC-CHANGELOG.1430832227 could not be processed - moving on...
[2015-05-05 09:23:56.294038] W
[master(/mnt/a-1-shares-brick-3/brick):862:process] _GMaster: SKIPPED GFID
[2015-05-05 09:23:56.381592] I
[master(/mnt/a-1-shares-brick-3/brick):1130:crawl] _GMaster: finished
hybrid crawl syncing
[2015-05-05 09:24:24.404884] I
[master(/mnt/a-1-shares-brick-4/brick):445:crawlwrap] _GMaster: 1 crawls, 1
turns
[2015-05-05 09:24:24.437452] I
[master(/mnt/a-1-shares-brick-4/brick):1124:crawl] _GMaster: starting
hybrid crawl...
[2015-05-05 09:24:24.588865] I
[master(/mnt/a-1-shares-brick-1/brick):1133:crawl] _GMaster: processing
xsync changelog
/usr/local/var/run/gluster/shares/ssh%3A%2F%2Froot%4010.XX.XXX.X%3Agluster%3A%2F%2F127.0.0.1%3Abkpshares/9d9a72f468c582609e97e8929e58b9ff/xsync/XSYNC-CHANGELOG.1430832135

This begs a couple of questions for me:

   1. Are these errcode:23 issues files that have been deleted/renamed
   since the changelog was created?
   2. Is it correct/expected for the node to drop into faulty and then
   recover itself to active periodically?

Thank you again for your assistance!
Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150505/0968c0ec/attachment.html>

Gluster users - May 2015 - Geo-Rep. 3.5.3, Missing Files, Incorrect "Files Pending"

[Gluster-users] Geo-Rep. 3.5.3, Missing Files, Incorrect "Files Pending"

[Gluster-users] Geo-Rep. 3.5.3, Missing Files, Incorrect "Files Pending"