thr3ads.net - Gluster users - [Gluster-users] Geo-rep failing initial sync [Oct 2015]

If this information is useful, please help other people find it:
Share via:

Wade Fitzpatrick

2015-Oct-19 02:54 UTC

[Gluster-users] Geo-rep failing initial sync

The relevant portions of the log appear to be as follows. Everything 
seemed fairly normal (though quite slow) until

[2015-10-08 15:31:26.471216] I 
[master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished 
hybrid crawl syncing, stime: (1444278018, 482251)
[2015-10-08 15:31:34.39248] I 
[syncdutils(/data/gluster1/static/brick1):220:finalize] <top>: exiting.
[2015-10-08 15:31:34.40934] I [repce(agent):92:service_loop] 
RepceServer: terminating on reaching EOF.
[2015-10-08 15:31:34.41220] I [syncdutils(agent):220:finalize] <top>: 
exiting.
[2015-10-08 15:31:35.615353] I [monitor(monitor):362:distribute] <top>: 
slave bricks: [{'host': 'palace', 'dir': 
'/data/gluster1/static/brick1'}, {'host': 'madonna',
'dir'
: '/data/gluster1/static/brick2'}]
[2015-10-08 15:31:35.616558] I [monitor(monitor):383:distribute] <top>: 
worker specs: [('/data/gluster1/static/brick1', 
'ssh://root at palace:gluster://localhost:static', 1)]
[2015-10-08 15:31:35.748434] I [monitor(monitor):221:monitor] Monitor: 
------------------------------------------------------------
[2015-10-08 15:31:35.748775] I [monitor(monitor):222:monitor] Monitor: 
starting gsyncd worker
[2015-10-08 15:31:35.837651] I [changelogagent(agent):75:__init__] 
ChangelogAgent: Agent listining...
[2015-10-08 15:31:35.841150] I 
[gsyncd(/data/gluster1/static/brick1):649:main_i] <top>: syncing: 
gluster://localhost:static -> ssh://root at palace:gluster://localhost:static
[2015-10-08 15:31:38.543379] I 
[master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: setting 
up xsync change detection mode
[2015-10-08 15:31:38.543802] I 
[master(/data/gluster1/static/brick1):401:__init__] _GMaster: using 'tar 
over ssh' as the sync engine
[2015-10-08 15:31:38.544673] I 
[master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: setting 
up xsync change detection mode
[2015-10-08 15:31:38.544924] I 
[master(/data/gluster1/static/brick1):401:__init__] _GMaster: using 'tar 
over ssh' as the sync engine
[2015-10-08 15:31:38.546163] I 
[master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: setting 
up xsync change detection mode
[2015-10-08 15:31:38.546406] I 
[master(/data/gluster1/static/brick1):401:__init__] _GMaster: using 'tar 
over ssh' as the sync engine
[2015-10-08 15:31:38.548989] I 
[master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync 
temp directory: 
/var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync
[2015-10-08 15:31:38.549267] I 
[master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync 
temp directory: 
/var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync
[2015-10-08 15:31:38.549467] I 
[master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync 
temp directory: 
/var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync
[2015-10-08 15:31:38.549632] I 
[resource(/data/gluster1/static/brick1):1432:service_loop] GLUSTER: 
Register time: 1444278698
[2015-10-08 15:31:38.582277] I 
[master(/data/gluster1/static/brick1):530:crawlwrap] _GMaster: primary 
master with volume id 3f9f810d-a988-4914-a5ca-5bd7b251a273 ...
[2015-10-08 15:31:38.584099] I 
[master(/data/gluster1/static/brick1):539:crawlwrap] _GMaster: crawl 
interval: 60 seconds
[2015-10-08 15:31:38.587405] I 
[master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting 
hybrid crawl..., stime: (1444278018, 482251)
[2015-10-08 15:31:38.588735] I 
[master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished 
hybrid crawl syncing, stime: (1444278018, 482251)
[2015-10-08 15:31:38.590116] I 
[master(/data/gluster1/static/brick1):530:crawlwrap] _GMaster: primary 
master with volume id 3f9f810d-a988-4914-a5ca-5bd7b251a273 ...
[2015-10-08 15:31:38.591582] I 
[master(/data/gluster1/static/brick1):539:crawlwrap] _GMaster: crawl 
interval: 60 seconds
[2015-10-08 15:31:38.593844] I 
[master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting 
hybrid crawl..., stime: (1444278018, 482251)
[2015-10-08 15:31:38.594832] I 
[master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished 
hybrid crawl syncing, stime: (1444278018, 482251)
[2015-10-08 15:32:38.641908] I 
[master(/data/gluster1/static/brick1):552:crawlwrap] _GMaster: 1 crawls, 
0 turns
[2015-10-08 15:32:38.644370] I 
[master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting 
hybrid crawl..., stime: (1444278018, 482251)
[2015-10-08 15:32:39.646733] I 
[master(/data/gluster1/static/brick1):1252:crawl] _GMaster: processing 
xsync changelog 
/var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync/XSYNC-CHANGELOG.1444278758
[2015-10-08 15:32:40.857084] W 
[master(/data/gluster1/static/brick1):803:log_failures] _GMaster: ENTRY 
FAILED: ({'uid': 0, 'gfid':
'fc446c88-a5b7-468b-ac52-25b4225fe0cf',
'gid': 0, 'mode': 33188, 'entry': 
'.gfid/e63d740c-ce17-4107-af35-d4030d2ae466/formguide-2015-10-08-gosford-1.html',
'op': 'MKNOD'}, 17,
'02489235-13c5-4232-8d6d-c7843bc5249b')
[2015-10-08 15:32:40.858580] W 
[master(/data/gluster1/static/brick1):803:log_failures] _GMaster: ENTRY 
FAILED: ({'uid': 0, 'gfid':
'e08813c5-055a-4354-94ec-f1b41a14b2a4',
'gid': 0, 'mode': 33188, 'entry': 
'.gfid/e63d740c-ce17-4107-af35-d4030d2ae466/formguide-2015-10-08-gosford-2.html',
'op': 'MKNOD'}, 17,
'0abae047-5816-4199-8203-fa8b974dfef5')

...

[2015-10-08 15:33:38.236779] W 
[master(/data/gluster1/static/brick1):803:log_failures] _GMaster: ENTRY 
FAILED: ({'uid': 1000, 'gfid':
'a41a2ac7-8fec-46bd-a4cc-8d8794e5ee39', 'gid
': 1000, 'mode': 33206, 'entry': 
'.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/1PYhnxMyMMcQo8ukuyMsqq.png',
'op': 'MKNOD'}, 17,
'e047db7d-f96c-496f-8a83-5db8e41859ca')
[2015-10-08 15:33:38.237443] W 
[master(/data/gluster1/static/brick1):803:log_failures] _GMaster: ENTRY 
FAILED: ({'uid': 1000, 'gfid':
'507f77db-0dc0-4d7f-9eb3-8f56b3e01765', 'gid
': 1000, 'mode': 33206, 'entry': 
'.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/17H7rpUIXGEQemM0wCoy6c.png',
'op': 'MKNOD'}, 17,
'ee7fa964-fc92-4008-b38a-e790fbbb1285')
[2015-10-08 15:33:38.238053] W 
[master(/data/gluster1/static/brick1):803:log_failures] _GMaster: ENTRY 
FAILED: ({'uid': 1000, 'gfid':
'6c495557-6808-4ff9-98de-39afbbeeac82', 'gid
': 1000, 'mode': 33206, 'entry': 
'.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/3T3VvUQH44my0Eosiieeok.png',
'op': 'MKNOD'}, 17,
'cc6a75c4-0817-497e-912b-4442fd19db83')
[2015-10-08 15:33:43.615427] W 
[master(/data/gluster1/static/brick1):1010:process] _GMaster: changelogs 
XSYNC-CHANGELOG.1444278758 could not be processed - moving on...
[2015-10-08 15:33:43.616425] W 
[master(/data/gluster1/static/brick1):1014:process] _GMaster: SKIPPED 
GFID = 
6c495557-6808-4ff9-98de-39afbbeeac82,16f94158-2f27-421b-9981-94d4197b2b3b,53d01d46-5724-4c77-846f-aacea7a3a447,9fbb536b-b7c6-41e1-8593-43e8a42b3fbe,1923ceff-d9a4-449e-b1c6-ce37c54d242c,3206332f-ed48-48d7-ad3f-cb82fbda0695,7696c570-edd5-481e-8cdc-3e...[truncated]


That type of entry repeats until

[2015-10-09 11:12:22.590574] I 
[master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished 
hybrid crawl syncing, stime: (1444349280, 617969)
[2015-10-09 11:13:22.650285] I 
[master(/data/gluster1/static/brick1):552:crawlwrap] _GMaster: 1 crawls, 
1 turns
[2015-10-09 11:13:22.653459] I 
[master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting 
hybrid crawl..., stime: (1444349280, 617969)
[2015-10-09 11:13:22.670430] W 
[master(/data/gluster1/static/brick1):1366:Xcrawl] _GMaster: irregular 
xtime for 
./racesoap/nominations/processed/.processed.2015-10-13.T.Ballina.V1.nomination.1444346457.247.Thj1Ly:
ENOENT

and then there were no more logs until 2015-10-13.

Thanks,
Wade.

On 16/10/2015 4:33 pm, Aravinda wrote:> Oh ok. I overlooked the status output. Please share the 
> geo-replication logs from "james" and "hilton" nodes.
>
> regards
> Aravinda
>
> On 10/15/2015 05:55 PM, Wade Fitzpatrick wrote:
>> Well I'm kind of worried about the 3 million failures listed in the
>> FAILURES column, the timestamp showing that syncing "stalled"
2 days
>> ago and the fact that only half of the files have been transferred to 
>> the remote volume.
>>
>> On 15/10/2015 9:27 pm, Aravinda wrote:
>>> Status looks good. Two master bricks are Active and participating
in
>>> syncing. Please let us know the issue you are observing.
>>> regards
>>> Aravinda
>>> On 10/15/2015 11:40 AM, Wade Fitzpatrick wrote:
>>>> I have twice now tried to configure geo-replication of our 
>>>> Stripe-Replicate volume to a remote Stripe volume but it always
>>>> seems to have issues.
>>>>
>>>> root at james:~# gluster volume info
>>>>
>>>> Volume Name: gluster_shared_storage
>>>> Type: Replicate
>>>> Volume ID: 5f446a10-651b-4ce0-a46b-69871f498dbc
>>>> Status: Started
>>>> Number of Bricks: 1 x 4 = 4
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: james:/data/gluster1/geo-rep-meta/brick
>>>> Brick2: cupid:/data/gluster1/geo-rep-meta/brick
>>>> Brick3: hilton:/data/gluster1/geo-rep-meta/brick
>>>> Brick4: present:/data/gluster1/geo-rep-meta/brick
>>>> Options Reconfigured:
>>>> performance.readdir-ahead: on
>>>>
>>>> Volume Name: static
>>>> Type: Striped-Replicate
>>>> Volume ID: 3f9f810d-a988-4914-a5ca-5bd7b251a273
>>>> Status: Started
>>>> Number of Bricks: 1 x 2 x 2 = 4
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: james:/data/gluster1/static/brick1
>>>> Brick2: cupid:/data/gluster1/static/brick2
>>>> Brick3: hilton:/data/gluster1/static/brick3
>>>> Brick4: present:/data/gluster1/static/brick4
>>>> Options Reconfigured:
>>>> auth.allow: 10.x.*
>>>> features.scrub: Active
>>>> features.bitrot: on
>>>> performance.readdir-ahead: on
>>>> geo-replication.indexing: on
>>>> geo-replication.ignore-pid-check: on
>>>> changelog.changelog: on
>>>>
>>>> root at palace:~# gluster volume info
>>>>
>>>> Volume Name: static
>>>> Type: Stripe
>>>> Volume ID: 3de935db-329b-4876-9ca4-a0f8d5f184c3
>>>> Status: Started
>>>> Number of Bricks: 1 x 2 = 2
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: palace:/data/gluster1/static/brick1
>>>> Brick2: madonna:/data/gluster1/static/brick2
>>>> Options Reconfigured:
>>>> features.scrub: Active
>>>> features.bitrot: on
>>>> performance.readdir-ahead: on
>>>>
>>>> root at james:~# gluster vol geo-rep static
ssh://gluster-b1::static
>>>> status detail
>>>>
>>>> MASTER NODE    MASTER VOL    MASTER BRICK SLAVE USER 
>>>> SLAVE                       SLAVE NODE STATUS     CRAWL 
>>>> STATUS       LAST_SYNCED            ENTRY DATA    META FAILURES
>>>> CHECKPOINT TIME    CHECKPOINT COMPLETED CHECKPOINT COMPLETION
TIME
>>>>
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>
>>>> james          static        /data/gluster1/static/brick1 root 
>>>> ssh://gluster-b1::static palace    Active     Changelog Crawl
>>>> 2015-10-13 14:23:20    0        0       0 1952064 
>>>> N/A                N/A                     N/A
>>>> hilton         static        /data/gluster1/static/brick3 root 
>>>> ssh://gluster-b1::static palace    Active     Changelog Crawl 
>>>> N/A                    0        0       0 1008035 
>>>> N/A                N/A                     N/A
>>>> present        static        /data/gluster1/static/brick4 root 
>>>> ssh://gluster-b1::static madonna    Passive    N/A 
>>>> N/A                    N/A      N/A     N/A     N/A 
>>>> N/A                N/A                     N/A
>>>> cupid          static        /data/gluster1/static/brick2 root 
>>>> ssh://gluster-b1::static madonna    Passive    N/A 
>>>> N/A                    N/A      N/A     N/A     N/A 
>>>> N/A                N/A                     N/A
>>>>
>>>>
>>>> So just to clarify, data is striped over bricks 1 and 3; bricks
2
>>>> and 4 are the replica.
>>>>
>>>> Can someone help me diagnose the problem and find a solution?
>>>>
>>>> Thanks in advance,
>>>> Wade.
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20151019/6dd6a019/attachment.html>

Wade Fitzpatrick

2015-Oct-19 04:29 UTC

head link

[Gluster-users] Geo-rep failing initial sync

I have now tried to re-initialise the whole geo-rep setup but the 
replication slave went Faulty immediately. Any help here would be 
appreciated, I cannot even find how to recover a faulty node without 
recreating the geo-rep.

root at james:~# gluster volume geo-replication static gluster-b1::static stop
Stopping geo-replication session between static & gluster-b1::static has 
been successful
root at james:~# gluster volume geo-replication static gluster-b1::static 
delete
Deleting geo-replication session between static & gluster-b1::static has 
been successful

I then destroyed the volume and re-created bricks on gluster-b1::static 
slave volume.

root at palace:~# gluster volume stop static
Stopping volume will make its data inaccessible. Do you want to 
continue? (y/n) y
volume stop: static: success
root at palace:~# gluster volume delete static
Deleting volume will erase all information about the volume. Do you want 
to continue? (y/n) y
volume delete: static: success

root at palace:~# gluster volume create static stripe 2 transport tcp 
palace:/data/gluster1/static/brick1 madonna:/data/gluster1/static/brick2
volume create: static: success: please start the volume to access data
root at palace:~# gluster volume info

Volume Name: static
Type: Stripe
Volume ID: dc14cd83-2736-4faf-8e11-c6d711ff8f56
Status: Created
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: palace:/data/gluster1/static/brick1
Brick2: madonna:/data/gluster1/static/brick2
Options Reconfigured:
performance.readdir-ahead: on
root at palace:~# gluster volume start static
volume start: static: success


Then established the geo-rep sync again

root at james:~# gluster volume geo-replication static 
ssh://gluster-b1::static create
Creating geo-replication session between static & 
ssh://gluster-b1::static has been successful
root at james:~# gluster volume geo-replication static 
ssh://gluster-b1::static config use_meta_volume true
geo-replication config updated successfully
root at james:~# gluster volume geo-replication static 
ssh://gluster-b1::static config use-tarssh true
geo-replication config updated successfully

root at james:~# gluster volume geo-replication static 
ssh://gluster-b1::static config
special_sync_mode: partial
state_socket_unencoded: 
/var/lib/glusterd/geo-replication/static_gluster-b1_static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic.socket
gluster_log_file: 
/var/log/glusterfs/geo-replication/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic.gluster.log
ssh_command: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no 
-i /var/lib/glusterd/geo-replication/secret.pem
use_tarssh: true
ignore_deletes: false
change_detector: changelog
gluster_command_dir: /usr/sbin/
state_file: 
/var/lib/glusterd/geo-replication/static_gluster-b1_static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic.status
remote_gsyncd: /nonexistent/gsyncd
log_file: 
/var/log/glusterfs/geo-replication/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic.log
changelog_log_file: 
/var/log/glusterfs/geo-replication/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic-changes.log
socketdir: /var/run/gluster
working_dir: 
/var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic
state_detail_file: 
/var/lib/glusterd/geo-replication/static_gluster-b1_static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic-detail.status
use_meta_volume: true
ssh_command_tar: ssh -oPasswordAuthentication=no 
-oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem
pid_file: 
/var/lib/glusterd/geo-replication/static_gluster-b1_static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic.pid
georep_session_working_dir: 
/var/lib/glusterd/geo-replication/static_gluster-b1_static/
gluster_params: aux-gfid-mount acl

root at james:~# gluster volume geo-replication static 
ssh://gluster-b1::static start
Geo-replication session between static and ssh://gluster-b1::static does 
not exist.
geo-replication command failed
root at james:~# gluster volume geo-replication static 
ssh://gluster-b1::static status detail

MASTER NODE    MASTER VOL    MASTER BRICK                    SLAVE 
USER    SLAVE                       SLAVE NODE    STATUS    CRAWL 
STATUS    LAST_SYNCED    ENTRY    DATA    META    FAILURES CHECKPOINT 
TIME    CHECKPOINT COMPLETED    CHECKPOINT COMPLETION TIME
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
james          static        /data/gluster1/static/brick1 root          
ssh://gluster-b1::static    N/A           N/A N/A             
N/A            N/A      N/A     N/A N/A         N/A                
N/A                     N/A
hilton         static        /data/gluster1/static/brick3 root          
ssh://gluster-b1::static    N/A           N/A N/A             
N/A            N/A      N/A     N/A N/A         N/A                
N/A                     N/A
present        static        /data/gluster1/static/brick4 root          
ssh://gluster-b1::static    N/A           N/A N/A             
N/A            N/A      N/A     N/A N/A         N/A                
N/A                     N/A
cupid          static        /data/gluster1/static/brick2 root          
ssh://gluster-b1::static    N/A           N/A N/A             
N/A            N/A      N/A     N/A N/A         N/A                
N/A                     N/A

root at james:~# gluster volume geo-replication static 
ssh://gluster-b1::static start
Starting geo-replication session between static & 
ssh://gluster-b1::static has been successful
root at james:~# gluster volume geo-replication static 
ssh://gluster-b1::static status detail

MASTER NODE    MASTER VOL    MASTER BRICK                    SLAVE 
USER    SLAVE                       SLAVE NODE STATUS             CRAWL 
STATUS    LAST_SYNCED    ENTRY    DATA META    FAILURES    CHECKPOINT 
TIME    CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
james          static        /data/gluster1/static/brick1 root          
ssh://gluster-b1::static    N/A Initializing...    N/A             
N/A            N/A      N/A N/A     N/A         N/A                
N/A                     N/A
hilton         static        /data/gluster1/static/brick3 root          
ssh://gluster-b1::static    palace Active             Hybrid Crawl    
N/A            0        0 0       0           N/A                
N/A                     N/A
present        static        /data/gluster1/static/brick4 root          
ssh://gluster-b1::static    madonna Passive            N/A             
N/A            N/A      N/A N/A     N/A         N/A                
N/A                     N/A
cupid          static        /data/gluster1/static/brick2 root          
ssh://gluster-b1::static    madonna Active             Hybrid Crawl    
N/A            0        0 0       0           N/A                
N/A                     N/A
root at james:~# gluster volume geo-replication static 
ssh://gluster-b1::static status detail

MASTER NODE    MASTER VOL    MASTER BRICK                    SLAVE 
USER    SLAVE                       SLAVE NODE    STATUS     CRAWL 
STATUS    LAST_SYNCED    ENTRY    DATA    META    FAILURES CHECKPOINT 
TIME    CHECKPOINT COMPLETED    CHECKPOINT COMPLETION TIME
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
james          static        /data/gluster1/static/brick1 root          
ssh://gluster-b1::static    N/A           Faulty N/A             
N/A            N/A      N/A     N/A N/A         N/A                
N/A                     N/A
hilton         static        /data/gluster1/static/brick3 root          
ssh://gluster-b1::static    palace        Active Hybrid Crawl    
N/A            8191     8187    0 0           N/A                
N/A                     N/A
present        static        /data/gluster1/static/brick4 root          
ssh://gluster-b1::static    madonna       Passive N/A             
N/A            N/A      N/A     N/A N/A         N/A                
N/A                     N/A
cupid          static        /data/gluster1/static/brick2 root          
ssh://gluster-b1::static    madonna       Active Hybrid Crawl    
N/A            8191     8187    0 0           N/A                
N/A                     N/A

Saravanakumar Arumugam

2015-Oct-19 09:07 UTC

head link

[Gluster-users] Geo-rep failing initial sync

Hi Wade,

There seems to be some issue in syncing the existing data in the volume 
using Xsync crawl.
( To give some background: When geo-rep is started it goes to filesystem 
crawl(Xsync) and sync all the data to slave, and then the session 
switches to CHANGELOG mode).

We are looking in to this.

Any specific reason to go for Stripe volume?  This seems to be not 
extensively tested with geo-rep.

Thanks,
Saravana

On 10/19/2015 08:24 AM, Wade Fitzpatrick wrote:> The relevant portions of the log appear to be as follows. Everything 
> seemed fairly normal (though quite slow) until
>
> [2015-10-08 15:31:26.471216] I 
> [master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished 
> hybrid crawl syncing, stime: (1444278018, 482251)
> [2015-10-08 15:31:34.39248] I 
> [syncdutils(/data/gluster1/static/brick1):220:finalize] <top>:
exiting.
> [2015-10-08 15:31:34.40934] I [repce(agent):92:service_loop] 
> RepceServer: terminating on reaching EOF.
> [2015-10-08 15:31:34.41220] I [syncdutils(agent):220:finalize] <top>:
> exiting.
> [2015-10-08 15:31:35.615353] I [monitor(monitor):362:distribute] 
> <top>: slave bricks: [{'host': 'palace',
'dir':
> '/data/gluster1/static/brick1'}, {'host':
'madonna', 'dir'
> : '/data/gluster1/static/brick2'}]
> [2015-10-08 15:31:35.616558] I [monitor(monitor):383:distribute] 
> <top>: worker specs: [('/data/gluster1/static/brick1', 
> 'ssh://root at palace:gluster://localhost:static', 1)]
> [2015-10-08 15:31:35.748434] I [monitor(monitor):221:monitor] Monitor: 
> ------------------------------------------------------------
> [2015-10-08 15:31:35.748775] I [monitor(monitor):222:monitor] Monitor: 
> starting gsyncd worker
> [2015-10-08 15:31:35.837651] I [changelogagent(agent):75:__init__] 
> ChangelogAgent: Agent listining...
> [2015-10-08 15:31:35.841150] I 
> [gsyncd(/data/gluster1/static/brick1):649:main_i] <top>: syncing: 
> gluster://localhost:static -> ssh://root at
palace:gluster://localhost:static
> [2015-10-08 15:31:38.543379] I 
> [master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: 
> setting up xsync change detection mode
> [2015-10-08 15:31:38.543802] I 
> [master(/data/gluster1/static/brick1):401:__init__] _GMaster: using 
> 'tar over ssh' as the sync engine
> [2015-10-08 15:31:38.544673] I 
> [master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: 
> setting up xsync change detection mode
> [2015-10-08 15:31:38.544924] I 
> [master(/data/gluster1/static/brick1):401:__init__] _GMaster: using 
> 'tar over ssh' as the sync engine
> [2015-10-08 15:31:38.546163] I 
> [master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: 
> setting up xsync change detection mode
> [2015-10-08 15:31:38.546406] I 
> [master(/data/gluster1/static/brick1):401:__init__] _GMaster: using 
> 'tar over ssh' as the sync engine
> [2015-10-08 15:31:38.548989] I 
> [master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync 
> temp directory: 
>
/var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync
> [2015-10-08 15:31:38.549267] I 
> [master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync 
> temp directory: 
>
/var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync
> [2015-10-08 15:31:38.549467] I 
> [master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync 
> temp directory: 
>
/var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync
> [2015-10-08 15:31:38.549632] I 
> [resource(/data/gluster1/static/brick1):1432:service_loop] GLUSTER: 
> Register time: 1444278698
> [2015-10-08 15:31:38.582277] I 
> [master(/data/gluster1/static/brick1):530:crawlwrap] _GMaster: primary 
> master with volume id 3f9f810d-a988-4914-a5ca-5bd7b251a273 ...
> [2015-10-08 15:31:38.584099] I 
> [master(/data/gluster1/static/brick1):539:crawlwrap] _GMaster: crawl 
> interval: 60 seconds
> [2015-10-08 15:31:38.587405] I 
> [master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting 
> hybrid crawl..., stime: (1444278018, 482251)
> [2015-10-08 15:31:38.588735] I 
> [master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished 
> hybrid crawl syncing, stime: (1444278018, 482251)
> [2015-10-08 15:31:38.590116] I 
> [master(/data/gluster1/static/brick1):530:crawlwrap] _GMaster: primary 
> master with volume id 3f9f810d-a988-4914-a5ca-5bd7b251a273 ...
> [2015-10-08 15:31:38.591582] I 
> [master(/data/gluster1/static/brick1):539:crawlwrap] _GMaster: crawl 
> interval: 60 seconds
> [2015-10-08 15:31:38.593844] I 
> [master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting 
> hybrid crawl..., stime: (1444278018, 482251)
> [2015-10-08 15:31:38.594832] I 
> [master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished 
> hybrid crawl syncing, stime: (1444278018, 482251)
> [2015-10-08 15:32:38.641908] I 
> [master(/data/gluster1/static/brick1):552:crawlwrap] _GMaster: 1 
> crawls, 0 turns
> [2015-10-08 15:32:38.644370] I 
> [master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting 
> hybrid crawl..., stime: (1444278018, 482251)
> [2015-10-08 15:32:39.646733] I 
> [master(/data/gluster1/static/brick1):1252:crawl] _GMaster: processing 
> xsync changelog 
>
/var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync/XSYNC-CHANGELOG.1444278758
> [2015-10-08 15:32:40.857084] W 
> [master(/data/gluster1/static/brick1):803:log_failures] _GMaster: 
> ENTRY FAILED: ({'uid': 0, 'gfid': 
> 'fc446c88-a5b7-468b-ac52-25b4225fe0cf', 'gid': 0,
'mode': 33188,
> 'entry': 
>
'.gfid/e63d740c-ce17-4107-af35-d4030d2ae466/formguide-2015-10-08-gosford-1.html',
> 'op': 'MKNOD'}, 17,
'02489235-13c5-4232-8d6d-c7843bc5249b')
> [2015-10-08 15:32:40.858580] W 
> [master(/data/gluster1/static/brick1):803:log_failures] _GMaster: 
> ENTRY FAILED: ({'uid': 0, 'gfid': 
> 'e08813c5-055a-4354-94ec-f1b41a14b2a4', 'gid': 0,
'mode': 33188,
> 'entry': 
>
'.gfid/e63d740c-ce17-4107-af35-d4030d2ae466/formguide-2015-10-08-gosford-2.html',
> 'op': 'MKNOD'}, 17,
'0abae047-5816-4199-8203-fa8b974dfef5')
>
> ...
>
> [2015-10-08 15:33:38.236779] W 
> [master(/data/gluster1/static/brick1):803:log_failures] _GMaster: 
> ENTRY FAILED: ({'uid': 1000, 'gfid': 
> 'a41a2ac7-8fec-46bd-a4cc-8d8794e5ee39', 'gid
> ': 1000, 'mode': 33206, 'entry': 
>
'.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/1PYhnxMyMMcQo8ukuyMsqq.png',
> 'op': 'MKNOD'}, 17,
'e047db7d-f96c-496f-8a83-5db8e41859ca')
> [2015-10-08 15:33:38.237443] W 
> [master(/data/gluster1/static/brick1):803:log_failures] _GMaster: 
> ENTRY FAILED: ({'uid': 1000, 'gfid': 
> '507f77db-0dc0-4d7f-9eb3-8f56b3e01765', 'gid
> ': 1000, 'mode': 33206, 'entry': 
>
'.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/17H7rpUIXGEQemM0wCoy6c.png',
> 'op': 'MKNOD'}, 17,
'ee7fa964-fc92-4008-b38a-e790fbbb1285')
> [2015-10-08 15:33:38.238053] W 
> [master(/data/gluster1/static/brick1):803:log_failures] _GMaster: 
> ENTRY FAILED: ({'uid': 1000, 'gfid': 
> '6c495557-6808-4ff9-98de-39afbbeeac82', 'gid
> ': 1000, 'mode': 33206, 'entry': 
>
'.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/3T3VvUQH44my0Eosiieeok.png',
> 'op': 'MKNOD'}, 17,
'cc6a75c4-0817-497e-912b-4442fd19db83')
> [2015-10-08 15:33:43.615427] W 
> [master(/data/gluster1/static/brick1):1010:process] _GMaster: 
> changelogs XSYNC-CHANGELOG.1444278758 could not be processed - moving 
> on...
> [2015-10-08 15:33:43.616425] W 
> [master(/data/gluster1/static/brick1):1014:process] _GMaster: SKIPPED 
> GFID = 
>
6c495557-6808-4ff9-98de-39afbbeeac82,16f94158-2f27-421b-9981-94d4197b2b3b,53d01d46-5724-4c77-846f-aacea7a3a447,9fbb536b-b7c6-41e1-8593-43e8a42b3fbe,1923ceff-d9a4-449e-b1c6-ce37c54d242c,3206332f-ed48-48d7-ad3f-cb82fbda0695,7696c570-edd5-481e-8cdc-3e...[truncated]
>
>
> That type of entry repeats until
>
> [2015-10-09 11:12:22.590574] I 
> [master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished 
> hybrid crawl syncing, stime: (1444349280, 617969)
> [2015-10-09 11:13:22.650285] I 
> [master(/data/gluster1/static/brick1):552:crawlwrap] _GMaster: 1 
> crawls, 1 turns
> [2015-10-09 11:13:22.653459] I 
> [master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting 
> hybrid crawl..., stime: (1444349280, 617969)
> [2015-10-09 11:13:22.670430] W 
> [master(/data/gluster1/static/brick1):1366:Xcrawl] _GMaster: irregular 
> xtime for 
>
./racesoap/nominations/processed/.processed.2015-10-13.T.Ballina.V1.nomination.1444346457.247.Thj1Ly:
> ENOENT
>
> and then there were no more logs until 2015-10-13.
>
> Thanks,
> Wade.
>
> On 16/10/2015 4:33 pm, Aravinda wrote:
>> Oh ok. I overlooked the status output. Please share the 
>> geo-replication logs from "james" and "hilton"
nodes.
>>
>> regards
>> Aravinda
>>
>> On 10/15/2015 05:55 PM, Wade Fitzpatrick wrote:
>>> Well I'm kind of worried about the 3 million failures listed in
the
>>> FAILURES column, the timestamp showing that syncing
"stalled" 2 days
>>> ago and the fact that only half of the files have been transferred 
>>> to the remote volume.
>>>
>>> On 15/10/2015 9:27 pm, Aravinda wrote:
>>>> Status looks good. Two master bricks are Active and
participating
>>>> in syncing. Please let us know the issue you are observing.
>>>> regards
>>>> Aravinda
>>>> On 10/15/2015 11:40 AM, Wade Fitzpatrick wrote:
>>>>> I have twice now tried to configure geo-replication of our 
>>>>> Stripe-Replicate volume to a remote Stripe volume but it
always
>>>>> seems to have issues.
>>>>>
>>>>> root at james:~# gluster volume info
>>>>>
>>>>> Volume Name: gluster_shared_storage
>>>>> Type: Replicate
>>>>> Volume ID: 5f446a10-651b-4ce0-a46b-69871f498dbc
>>>>> Status: Started
>>>>> Number of Bricks: 1 x 4 = 4
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: james:/data/gluster1/geo-rep-meta/brick
>>>>> Brick2: cupid:/data/gluster1/geo-rep-meta/brick
>>>>> Brick3: hilton:/data/gluster1/geo-rep-meta/brick
>>>>> Brick4: present:/data/gluster1/geo-rep-meta/brick
>>>>> Options Reconfigured:
>>>>> performance.readdir-ahead: on
>>>>>
>>>>> Volume Name: static
>>>>> Type: Striped-Replicate
>>>>> Volume ID: 3f9f810d-a988-4914-a5ca-5bd7b251a273
>>>>> Status: Started
>>>>> Number of Bricks: 1 x 2 x 2 = 4
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: james:/data/gluster1/static/brick1
>>>>> Brick2: cupid:/data/gluster1/static/brick2
>>>>> Brick3: hilton:/data/gluster1/static/brick3
>>>>> Brick4: present:/data/gluster1/static/brick4
>>>>> Options Reconfigured:
>>>>> auth.allow: 10.x.*
>>>>> features.scrub: Active
>>>>> features.bitrot: on
>>>>> performance.readdir-ahead: on
>>>>> geo-replication.indexing: on
>>>>> geo-replication.ignore-pid-check: on
>>>>> changelog.changelog: on
>>>>>
>>>>> root at palace:~# gluster volume info
>>>>>
>>>>> Volume Name: static
>>>>> Type: Stripe
>>>>> Volume ID: 3de935db-329b-4876-9ca4-a0f8d5f184c3
>>>>> Status: Started
>>>>> Number of Bricks: 1 x 2 = 2
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: palace:/data/gluster1/static/brick1
>>>>> Brick2: madonna:/data/gluster1/static/brick2
>>>>> Options Reconfigured:
>>>>> features.scrub: Active
>>>>> features.bitrot: on
>>>>> performance.readdir-ahead: on
>>>>>
>>>>> root at james:~# gluster vol geo-rep static
ssh://gluster-b1::static
>>>>> status detail
>>>>>
>>>>> MASTER NODE    MASTER VOL    MASTER BRICK SLAVE USER 
>>>>> SLAVE                       SLAVE NODE STATUS     CRAWL 
>>>>> STATUS       LAST_SYNCED            ENTRY DATA    META
FAILURES
>>>>> CHECKPOINT TIME    CHECKPOINT COMPLETED CHECKPOINT
COMPLETION TIME
>>>>>
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>
>>>>> james          static        /data/gluster1/static/brick1
root
>>>>> ssh://gluster-b1::static palace    Active     Changelog
Crawl
>>>>> 2015-10-13 14:23:20    0        0       0 1952064 N/A 
>>>>> N/A                     N/A
>>>>> hilton         static        /data/gluster1/static/brick3
root
>>>>> ssh://gluster-b1::static palace    Active     Changelog
Crawl
>>>>> N/A                    0        0       0       1008035 
>>>>> N/A                N/A                     N/A
>>>>> present        static        /data/gluster1/static/brick4
root
>>>>> ssh://gluster-b1::static madonna    Passive    N/A 
>>>>> N/A                    N/A N/A     N/A     N/A N/A 
>>>>> N/A                     N/A
>>>>> cupid          static        /data/gluster1/static/brick2
root
>>>>> ssh://gluster-b1::static madonna    Passive    N/A 
>>>>> N/A                    N/A N/A     N/A     N/A N/A 
>>>>> N/A                     N/A
>>>>>
>>>>>
>>>>> So just to clarify, data is striped over bricks 1 and 3;
bricks 2
>>>>> and 4 are the replica.
>>>>>
>>>>> Can someone help me diagnose the problem and find a
solution?
>>>>>
>>>>> Thanks in advance,
>>>>> Wade.
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>
>>>
>>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20151019/72ef3526/attachment.html>

Gluster users - Oct 2015 - Geo-rep failing initial sync

[Gluster-users] Geo-rep failing initial sync

[Gluster-users] Geo-rep failing initial sync

[Gluster-users] Geo-rep failing initial sync