thr3ads.net - Gluster users - [Gluster-users] Geo-rep failing initial sync [Oct 2015]

If this information is useful, please help other people find it:
Share via:

Aravinda

2015-Oct-16 06:33 UTC

[Gluster-users] Geo-rep failing initial sync

Oh ok. I overlooked the status output. Please share the geo-replication 
logs from "james" and "hilton" nodes.

regards
Aravinda

On 10/15/2015 05:55 PM, Wade Fitzpatrick wrote:> Well I'm kind of worried about the 3 million failures listed in the 
> FAILURES column, the timestamp showing that syncing "stalled" 2
days
> ago and the fact that only half of the files have been transferred to 
> the remote volume.
>
> On 15/10/2015 9:27 pm, Aravinda wrote:
>> Status looks good. Two master bricks are Active and participating in 
>> syncing. Please let us know the issue you are observing.
>> regards
>> Aravinda
>> On 10/15/2015 11:40 AM, Wade Fitzpatrick wrote:
>>> I have twice now tried to configure geo-replication of our 
>>> Stripe-Replicate volume to a remote Stripe volume but it always 
>>> seems to have issues.
>>>
>>> root at james:~# gluster volume info
>>>
>>> Volume Name: gluster_shared_storage
>>> Type: Replicate
>>> Volume ID: 5f446a10-651b-4ce0-a46b-69871f498dbc
>>> Status: Started
>>> Number of Bricks: 1 x 4 = 4
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: james:/data/gluster1/geo-rep-meta/brick
>>> Brick2: cupid:/data/gluster1/geo-rep-meta/brick
>>> Brick3: hilton:/data/gluster1/geo-rep-meta/brick
>>> Brick4: present:/data/gluster1/geo-rep-meta/brick
>>> Options Reconfigured:
>>> performance.readdir-ahead: on
>>>
>>> Volume Name: static
>>> Type: Striped-Replicate
>>> Volume ID: 3f9f810d-a988-4914-a5ca-5bd7b251a273
>>> Status: Started
>>> Number of Bricks: 1 x 2 x 2 = 4
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: james:/data/gluster1/static/brick1
>>> Brick2: cupid:/data/gluster1/static/brick2
>>> Brick3: hilton:/data/gluster1/static/brick3
>>> Brick4: present:/data/gluster1/static/brick4
>>> Options Reconfigured:
>>> auth.allow: 10.x.*
>>> features.scrub: Active
>>> features.bitrot: on
>>> performance.readdir-ahead: on
>>> geo-replication.indexing: on
>>> geo-replication.ignore-pid-check: on
>>> changelog.changelog: on
>>>
>>> root at palace:~# gluster volume info
>>>
>>> Volume Name: static
>>> Type: Stripe
>>> Volume ID: 3de935db-329b-4876-9ca4-a0f8d5f184c3
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: palace:/data/gluster1/static/brick1
>>> Brick2: madonna:/data/gluster1/static/brick2
>>> Options Reconfigured:
>>> features.scrub: Active
>>> features.bitrot: on
>>> performance.readdir-ahead: on
>>>
>>> root at james:~# gluster vol geo-rep static
ssh://gluster-b1::static
>>> status detail
>>>
>>> MASTER NODE    MASTER VOL    MASTER BRICK SLAVE USER 
>>> SLAVE                       SLAVE NODE STATUS     CRAWL STATUS
>>> LAST_SYNCED            ENTRY DATA    META FAILURES    CHECKPOINT 
>>> TIME    CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME
>>>
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>> james          static        /data/gluster1/static/brick1 root 
>>> ssh://gluster-b1::static 10.37.1.11    Active     Changelog Crawl
>>> 2015-10-13 14:23:20    0        0       0 1952064 N/A
>>> N/A                     N/A
>>> hilton         static        /data/gluster1/static/brick3 root 
>>> ssh://gluster-b1::static 10.37.1.11    Active     Changelog Crawl 
>>> N/A                    0        0       0       1008035 
>>> N/A                N/A                     N/A
>>> present        static        /data/gluster1/static/brick4 root 
>>> ssh://gluster-b1::static 10.37.1.12    Passive    N/A 
>>> N/A                    N/A      N/A     N/A     N/A 
>>> N/A                N/A                     N/A
>>> cupid          static        /data/gluster1/static/brick2 root 
>>> ssh://gluster-b1::static 10.37.1.12    Passive    N/A 
>>> N/A                    N/A      N/A     N/A     N/A 
>>> N/A                N/A                     N/A
>>>
>>>
>>> So just to clarify, data is striped over bricks 1 and 3; bricks 2 
>>> and 4 are the replica.
>>>
>>> Can someone help me diagnose the problem and find a solution?
>>>
>>> Thanks in advance,
>>> Wade.
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>
>

Wade Fitzpatrick

2015-Oct-19 02:54 UTC

head link

[Gluster-users] Geo-rep failing initial sync

The relevant portions of the log appear to be as follows. Everything 
seemed fairly normal (though quite slow) until

[2015-10-08 15:31:26.471216] I 
[master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished 
hybrid crawl syncing, stime: (1444278018, 482251)
[2015-10-08 15:31:34.39248] I 
[syncdutils(/data/gluster1/static/brick1):220:finalize] <top>: exiting.
[2015-10-08 15:31:34.40934] I [repce(agent):92:service_loop] 
RepceServer: terminating on reaching EOF.
[2015-10-08 15:31:34.41220] I [syncdutils(agent):220:finalize] <top>: 
exiting.
[2015-10-08 15:31:35.615353] I [monitor(monitor):362:distribute] <top>: 
slave bricks: [{'host': 'palace', 'dir': 
'/data/gluster1/static/brick1'}, {'host': 'madonna',
'dir'
: '/data/gluster1/static/brick2'}]
[2015-10-08 15:31:35.616558] I [monitor(monitor):383:distribute] <top>: 
worker specs: [('/data/gluster1/static/brick1', 
'ssh://root at palace:gluster://localhost:static', 1)]
[2015-10-08 15:31:35.748434] I [monitor(monitor):221:monitor] Monitor: 
------------------------------------------------------------
[2015-10-08 15:31:35.748775] I [monitor(monitor):222:monitor] Monitor: 
starting gsyncd worker
[2015-10-08 15:31:35.837651] I [changelogagent(agent):75:__init__] 
ChangelogAgent: Agent listining...
[2015-10-08 15:31:35.841150] I 
[gsyncd(/data/gluster1/static/brick1):649:main_i] <top>: syncing: 
gluster://localhost:static -> ssh://root at palace:gluster://localhost:static
[2015-10-08 15:31:38.543379] I 
[master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: setting 
up xsync change detection mode
[2015-10-08 15:31:38.543802] I 
[master(/data/gluster1/static/brick1):401:__init__] _GMaster: using 'tar 
over ssh' as the sync engine
[2015-10-08 15:31:38.544673] I 
[master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: setting 
up xsync change detection mode
[2015-10-08 15:31:38.544924] I 
[master(/data/gluster1/static/brick1):401:__init__] _GMaster: using 'tar 
over ssh' as the sync engine
[2015-10-08 15:31:38.546163] I 
[master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: setting 
up xsync change detection mode
[2015-10-08 15:31:38.546406] I 
[master(/data/gluster1/static/brick1):401:__init__] _GMaster: using 'tar 
over ssh' as the sync engine
[2015-10-08 15:31:38.548989] I 
[master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync 
temp directory: 
/var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync
[2015-10-08 15:31:38.549267] I 
[master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync 
temp directory: 
/var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync
[2015-10-08 15:31:38.549467] I 
[master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync 
temp directory: 
/var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync
[2015-10-08 15:31:38.549632] I 
[resource(/data/gluster1/static/brick1):1432:service_loop] GLUSTER: 
Register time: 1444278698
[2015-10-08 15:31:38.582277] I 
[master(/data/gluster1/static/brick1):530:crawlwrap] _GMaster: primary 
master with volume id 3f9f810d-a988-4914-a5ca-5bd7b251a273 ...
[2015-10-08 15:31:38.584099] I 
[master(/data/gluster1/static/brick1):539:crawlwrap] _GMaster: crawl 
interval: 60 seconds
[2015-10-08 15:31:38.587405] I 
[master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting 
hybrid crawl..., stime: (1444278018, 482251)
[2015-10-08 15:31:38.588735] I 
[master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished 
hybrid crawl syncing, stime: (1444278018, 482251)
[2015-10-08 15:31:38.590116] I 
[master(/data/gluster1/static/brick1):530:crawlwrap] _GMaster: primary 
master with volume id 3f9f810d-a988-4914-a5ca-5bd7b251a273 ...
[2015-10-08 15:31:38.591582] I 
[master(/data/gluster1/static/brick1):539:crawlwrap] _GMaster: crawl 
interval: 60 seconds
[2015-10-08 15:31:38.593844] I 
[master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting 
hybrid crawl..., stime: (1444278018, 482251)
[2015-10-08 15:31:38.594832] I 
[master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished 
hybrid crawl syncing, stime: (1444278018, 482251)
[2015-10-08 15:32:38.641908] I 
[master(/data/gluster1/static/brick1):552:crawlwrap] _GMaster: 1 crawls, 
0 turns
[2015-10-08 15:32:38.644370] I 
[master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting 
hybrid crawl..., stime: (1444278018, 482251)
[2015-10-08 15:32:39.646733] I 
[master(/data/gluster1/static/brick1):1252:crawl] _GMaster: processing 
xsync changelog 
/var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync/XSYNC-CHANGELOG.1444278758
[2015-10-08 15:32:40.857084] W 
[master(/data/gluster1/static/brick1):803:log_failures] _GMaster: ENTRY 
FAILED: ({'uid': 0, 'gfid':
'fc446c88-a5b7-468b-ac52-25b4225fe0cf',
'gid': 0, 'mode': 33188, 'entry': 
'.gfid/e63d740c-ce17-4107-af35-d4030d2ae466/formguide-2015-10-08-gosford-1.html',
'op': 'MKNOD'}, 17,
'02489235-13c5-4232-8d6d-c7843bc5249b')
[2015-10-08 15:32:40.858580] W 
[master(/data/gluster1/static/brick1):803:log_failures] _GMaster: ENTRY 
FAILED: ({'uid': 0, 'gfid':
'e08813c5-055a-4354-94ec-f1b41a14b2a4',
'gid': 0, 'mode': 33188, 'entry': 
'.gfid/e63d740c-ce17-4107-af35-d4030d2ae466/formguide-2015-10-08-gosford-2.html',
'op': 'MKNOD'}, 17,
'0abae047-5816-4199-8203-fa8b974dfef5')

...

[2015-10-08 15:33:38.236779] W 
[master(/data/gluster1/static/brick1):803:log_failures] _GMaster: ENTRY 
FAILED: ({'uid': 1000, 'gfid':
'a41a2ac7-8fec-46bd-a4cc-8d8794e5ee39', 'gid
': 1000, 'mode': 33206, 'entry': 
'.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/1PYhnxMyMMcQo8ukuyMsqq.png',
'op': 'MKNOD'}, 17,
'e047db7d-f96c-496f-8a83-5db8e41859ca')
[2015-10-08 15:33:38.237443] W 
[master(/data/gluster1/static/brick1):803:log_failures] _GMaster: ENTRY 
FAILED: ({'uid': 1000, 'gfid':
'507f77db-0dc0-4d7f-9eb3-8f56b3e01765', 'gid
': 1000, 'mode': 33206, 'entry': 
'.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/17H7rpUIXGEQemM0wCoy6c.png',
'op': 'MKNOD'}, 17,
'ee7fa964-fc92-4008-b38a-e790fbbb1285')
[2015-10-08 15:33:38.238053] W 
[master(/data/gluster1/static/brick1):803:log_failures] _GMaster: ENTRY 
FAILED: ({'uid': 1000, 'gfid':
'6c495557-6808-4ff9-98de-39afbbeeac82', 'gid
': 1000, 'mode': 33206, 'entry': 
'.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/3T3VvUQH44my0Eosiieeok.png',
'op': 'MKNOD'}, 17,
'cc6a75c4-0817-497e-912b-4442fd19db83')
[2015-10-08 15:33:43.615427] W 
[master(/data/gluster1/static/brick1):1010:process] _GMaster: changelogs 
XSYNC-CHANGELOG.1444278758 could not be processed - moving on...
[2015-10-08 15:33:43.616425] W 
[master(/data/gluster1/static/brick1):1014:process] _GMaster: SKIPPED 
GFID = 
6c495557-6808-4ff9-98de-39afbbeeac82,16f94158-2f27-421b-9981-94d4197b2b3b,53d01d46-5724-4c77-846f-aacea7a3a447,9fbb536b-b7c6-41e1-8593-43e8a42b3fbe,1923ceff-d9a4-449e-b1c6-ce37c54d242c,3206332f-ed48-48d7-ad3f-cb82fbda0695,7696c570-edd5-481e-8cdc-3e...[truncated]


That type of entry repeats until

[2015-10-09 11:12:22.590574] I 
[master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished 
hybrid crawl syncing, stime: (1444349280, 617969)
[2015-10-09 11:13:22.650285] I 
[master(/data/gluster1/static/brick1):552:crawlwrap] _GMaster: 1 crawls, 
1 turns
[2015-10-09 11:13:22.653459] I 
[master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting 
hybrid crawl..., stime: (1444349280, 617969)
[2015-10-09 11:13:22.670430] W 
[master(/data/gluster1/static/brick1):1366:Xcrawl] _GMaster: irregular 
xtime for 
./racesoap/nominations/processed/.processed.2015-10-13.T.Ballina.V1.nomination.1444346457.247.Thj1Ly:
ENOENT

and then there were no more logs until 2015-10-13.

Thanks,
Wade.

On 16/10/2015 4:33 pm, Aravinda wrote:> Oh ok. I overlooked the status output. Please share the 
> geo-replication logs from "james" and "hilton" nodes.
>
> regards
> Aravinda
>
> On 10/15/2015 05:55 PM, Wade Fitzpatrick wrote:
>> Well I'm kind of worried about the 3 million failures listed in the
>> FAILURES column, the timestamp showing that syncing "stalled"
2 days
>> ago and the fact that only half of the files have been transferred to 
>> the remote volume.
>>
>> On 15/10/2015 9:27 pm, Aravinda wrote:
>>> Status looks good. Two master bricks are Active and participating
in
>>> syncing. Please let us know the issue you are observing.
>>> regards
>>> Aravinda
>>> On 10/15/2015 11:40 AM, Wade Fitzpatrick wrote:
>>>> I have twice now tried to configure geo-replication of our 
>>>> Stripe-Replicate volume to a remote Stripe volume but it always
>>>> seems to have issues.
>>>>
>>>> root at james:~# gluster volume info
>>>>
>>>> Volume Name: gluster_shared_storage
>>>> Type: Replicate
>>>> Volume ID: 5f446a10-651b-4ce0-a46b-69871f498dbc
>>>> Status: Started
>>>> Number of Bricks: 1 x 4 = 4
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: james:/data/gluster1/geo-rep-meta/brick
>>>> Brick2: cupid:/data/gluster1/geo-rep-meta/brick
>>>> Brick3: hilton:/data/gluster1/geo-rep-meta/brick
>>>> Brick4: present:/data/gluster1/geo-rep-meta/brick
>>>> Options Reconfigured:
>>>> performance.readdir-ahead: on
>>>>
>>>> Volume Name: static
>>>> Type: Striped-Replicate
>>>> Volume ID: 3f9f810d-a988-4914-a5ca-5bd7b251a273
>>>> Status: Started
>>>> Number of Bricks: 1 x 2 x 2 = 4
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: james:/data/gluster1/static/brick1
>>>> Brick2: cupid:/data/gluster1/static/brick2
>>>> Brick3: hilton:/data/gluster1/static/brick3
>>>> Brick4: present:/data/gluster1/static/brick4
>>>> Options Reconfigured:
>>>> auth.allow: 10.x.*
>>>> features.scrub: Active
>>>> features.bitrot: on
>>>> performance.readdir-ahead: on
>>>> geo-replication.indexing: on
>>>> geo-replication.ignore-pid-check: on
>>>> changelog.changelog: on
>>>>
>>>> root at palace:~# gluster volume info
>>>>
>>>> Volume Name: static
>>>> Type: Stripe
>>>> Volume ID: 3de935db-329b-4876-9ca4-a0f8d5f184c3
>>>> Status: Started
>>>> Number of Bricks: 1 x 2 = 2
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: palace:/data/gluster1/static/brick1
>>>> Brick2: madonna:/data/gluster1/static/brick2
>>>> Options Reconfigured:
>>>> features.scrub: Active
>>>> features.bitrot: on
>>>> performance.readdir-ahead: on
>>>>
>>>> root at james:~# gluster vol geo-rep static
ssh://gluster-b1::static
>>>> status detail
>>>>
>>>> MASTER NODE    MASTER VOL    MASTER BRICK SLAVE USER 
>>>> SLAVE                       SLAVE NODE STATUS     CRAWL 
>>>> STATUS       LAST_SYNCED            ENTRY DATA    META FAILURES
>>>> CHECKPOINT TIME    CHECKPOINT COMPLETED CHECKPOINT COMPLETION
TIME
>>>>
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>
>>>> james          static        /data/gluster1/static/brick1 root 
>>>> ssh://gluster-b1::static palace    Active     Changelog Crawl
>>>> 2015-10-13 14:23:20    0        0       0 1952064 
>>>> N/A                N/A                     N/A
>>>> hilton         static        /data/gluster1/static/brick3 root 
>>>> ssh://gluster-b1::static palace    Active     Changelog Crawl 
>>>> N/A                    0        0       0 1008035 
>>>> N/A                N/A                     N/A
>>>> present        static        /data/gluster1/static/brick4 root 
>>>> ssh://gluster-b1::static madonna    Passive    N/A 
>>>> N/A                    N/A      N/A     N/A     N/A 
>>>> N/A                N/A                     N/A
>>>> cupid          static        /data/gluster1/static/brick2 root 
>>>> ssh://gluster-b1::static madonna    Passive    N/A 
>>>> N/A                    N/A      N/A     N/A     N/A 
>>>> N/A                N/A                     N/A
>>>>
>>>>
>>>> So just to clarify, data is striped over bricks 1 and 3; bricks
2
>>>> and 4 are the replica.
>>>>
>>>> Can someone help me diagnose the problem and find a solution?
>>>>
>>>> Thanks in advance,
>>>> Wade.
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20151019/6dd6a019/attachment.html>

Gluster users - Oct 2015 - Geo-rep failing initial sync

[Gluster-users] Geo-rep failing initial sync

[Gluster-users] Geo-rep failing initial sync