The relevant portions of the log appear to be as follows. Everything seemed fairly normal (though quite slow) until [2015-10-08 15:31:26.471216] I [master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished hybrid crawl syncing, stime: (1444278018, 482251) [2015-10-08 15:31:34.39248] I [syncdutils(/data/gluster1/static/brick1):220:finalize] <top>: exiting. [2015-10-08 15:31:34.40934] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF. [2015-10-08 15:31:34.41220] I [syncdutils(agent):220:finalize] <top>: exiting. [2015-10-08 15:31:35.615353] I [monitor(monitor):362:distribute] <top>: slave bricks: [{'host': 'palace', 'dir': '/data/gluster1/static/brick1'}, {'host': 'madonna', 'dir' : '/data/gluster1/static/brick2'}] [2015-10-08 15:31:35.616558] I [monitor(monitor):383:distribute] <top>: worker specs: [('/data/gluster1/static/brick1', 'ssh://root at palace:gluster://localhost:static', 1)] [2015-10-08 15:31:35.748434] I [monitor(monitor):221:monitor] Monitor: ------------------------------------------------------------ [2015-10-08 15:31:35.748775] I [monitor(monitor):222:monitor] Monitor: starting gsyncd worker [2015-10-08 15:31:35.837651] I [changelogagent(agent):75:__init__] ChangelogAgent: Agent listining... [2015-10-08 15:31:35.841150] I [gsyncd(/data/gluster1/static/brick1):649:main_i] <top>: syncing: gluster://localhost:static -> ssh://root at palace:gluster://localhost:static [2015-10-08 15:31:38.543379] I [master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: setting up xsync change detection mode [2015-10-08 15:31:38.543802] I [master(/data/gluster1/static/brick1):401:__init__] _GMaster: using 'tar over ssh' as the sync engine [2015-10-08 15:31:38.544673] I [master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: setting up xsync change detection mode [2015-10-08 15:31:38.544924] I [master(/data/gluster1/static/brick1):401:__init__] _GMaster: using 'tar over ssh' as the sync engine [2015-10-08 15:31:38.546163] I [master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: setting up xsync change detection mode [2015-10-08 15:31:38.546406] I [master(/data/gluster1/static/brick1):401:__init__] _GMaster: using 'tar over ssh' as the sync engine [2015-10-08 15:31:38.548989] I [master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync temp directory: /var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync [2015-10-08 15:31:38.549267] I [master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync temp directory: /var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync [2015-10-08 15:31:38.549467] I [master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync temp directory: /var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync [2015-10-08 15:31:38.549632] I [resource(/data/gluster1/static/brick1):1432:service_loop] GLUSTER: Register time: 1444278698 [2015-10-08 15:31:38.582277] I [master(/data/gluster1/static/brick1):530:crawlwrap] _GMaster: primary master with volume id 3f9f810d-a988-4914-a5ca-5bd7b251a273 ... [2015-10-08 15:31:38.584099] I [master(/data/gluster1/static/brick1):539:crawlwrap] _GMaster: crawl interval: 60 seconds [2015-10-08 15:31:38.587405] I [master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting hybrid crawl..., stime: (1444278018, 482251) [2015-10-08 15:31:38.588735] I [master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished hybrid crawl syncing, stime: (1444278018, 482251) [2015-10-08 15:31:38.590116] I [master(/data/gluster1/static/brick1):530:crawlwrap] _GMaster: primary master with volume id 3f9f810d-a988-4914-a5ca-5bd7b251a273 ... [2015-10-08 15:31:38.591582] I [master(/data/gluster1/static/brick1):539:crawlwrap] _GMaster: crawl interval: 60 seconds [2015-10-08 15:31:38.593844] I [master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting hybrid crawl..., stime: (1444278018, 482251) [2015-10-08 15:31:38.594832] I [master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished hybrid crawl syncing, stime: (1444278018, 482251) [2015-10-08 15:32:38.641908] I [master(/data/gluster1/static/brick1):552:crawlwrap] _GMaster: 1 crawls, 0 turns [2015-10-08 15:32:38.644370] I [master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting hybrid crawl..., stime: (1444278018, 482251) [2015-10-08 15:32:39.646733] I [master(/data/gluster1/static/brick1):1252:crawl] _GMaster: processing xsync changelog /var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync/XSYNC-CHANGELOG.1444278758 [2015-10-08 15:32:40.857084] W [master(/data/gluster1/static/brick1):803:log_failures] _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid': 'fc446c88-a5b7-468b-ac52-25b4225fe0cf', 'gid': 0, 'mode': 33188, 'entry': '.gfid/e63d740c-ce17-4107-af35-d4030d2ae466/formguide-2015-10-08-gosford-1.html', 'op': 'MKNOD'}, 17, '02489235-13c5-4232-8d6d-c7843bc5249b') [2015-10-08 15:32:40.858580] W [master(/data/gluster1/static/brick1):803:log_failures] _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid': 'e08813c5-055a-4354-94ec-f1b41a14b2a4', 'gid': 0, 'mode': 33188, 'entry': '.gfid/e63d740c-ce17-4107-af35-d4030d2ae466/formguide-2015-10-08-gosford-2.html', 'op': 'MKNOD'}, 17, '0abae047-5816-4199-8203-fa8b974dfef5') ... [2015-10-08 15:33:38.236779] W [master(/data/gluster1/static/brick1):803:log_failures] _GMaster: ENTRY FAILED: ({'uid': 1000, 'gfid': 'a41a2ac7-8fec-46bd-a4cc-8d8794e5ee39', 'gid ': 1000, 'mode': 33206, 'entry': '.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/1PYhnxMyMMcQo8ukuyMsqq.png', 'op': 'MKNOD'}, 17, 'e047db7d-f96c-496f-8a83-5db8e41859ca') [2015-10-08 15:33:38.237443] W [master(/data/gluster1/static/brick1):803:log_failures] _GMaster: ENTRY FAILED: ({'uid': 1000, 'gfid': '507f77db-0dc0-4d7f-9eb3-8f56b3e01765', 'gid ': 1000, 'mode': 33206, 'entry': '.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/17H7rpUIXGEQemM0wCoy6c.png', 'op': 'MKNOD'}, 17, 'ee7fa964-fc92-4008-b38a-e790fbbb1285') [2015-10-08 15:33:38.238053] W [master(/data/gluster1/static/brick1):803:log_failures] _GMaster: ENTRY FAILED: ({'uid': 1000, 'gfid': '6c495557-6808-4ff9-98de-39afbbeeac82', 'gid ': 1000, 'mode': 33206, 'entry': '.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/3T3VvUQH44my0Eosiieeok.png', 'op': 'MKNOD'}, 17, 'cc6a75c4-0817-497e-912b-4442fd19db83') [2015-10-08 15:33:43.615427] W [master(/data/gluster1/static/brick1):1010:process] _GMaster: changelogs XSYNC-CHANGELOG.1444278758 could not be processed - moving on... [2015-10-08 15:33:43.616425] W [master(/data/gluster1/static/brick1):1014:process] _GMaster: SKIPPED GFID = 6c495557-6808-4ff9-98de-39afbbeeac82,16f94158-2f27-421b-9981-94d4197b2b3b,53d01d46-5724-4c77-846f-aacea7a3a447,9fbb536b-b7c6-41e1-8593-43e8a42b3fbe,1923ceff-d9a4-449e-b1c6-ce37c54d242c,3206332f-ed48-48d7-ad3f-cb82fbda0695,7696c570-edd5-481e-8cdc-3e...[truncated] That type of entry repeats until [2015-10-09 11:12:22.590574] I [master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished hybrid crawl syncing, stime: (1444349280, 617969) [2015-10-09 11:13:22.650285] I [master(/data/gluster1/static/brick1):552:crawlwrap] _GMaster: 1 crawls, 1 turns [2015-10-09 11:13:22.653459] I [master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting hybrid crawl..., stime: (1444349280, 617969) [2015-10-09 11:13:22.670430] W [master(/data/gluster1/static/brick1):1366:Xcrawl] _GMaster: irregular xtime for ./racesoap/nominations/processed/.processed.2015-10-13.T.Ballina.V1.nomination.1444346457.247.Thj1Ly: ENOENT and then there were no more logs until 2015-10-13. Thanks, Wade. On 16/10/2015 4:33 pm, Aravinda wrote:> Oh ok. I overlooked the status output. Please share the > geo-replication logs from "james" and "hilton" nodes. > > regards > Aravinda > > On 10/15/2015 05:55 PM, Wade Fitzpatrick wrote: >> Well I'm kind of worried about the 3 million failures listed in the >> FAILURES column, the timestamp showing that syncing "stalled" 2 days >> ago and the fact that only half of the files have been transferred to >> the remote volume. >> >> On 15/10/2015 9:27 pm, Aravinda wrote: >>> Status looks good. Two master bricks are Active and participating in >>> syncing. Please let us know the issue you are observing. >>> regards >>> Aravinda >>> On 10/15/2015 11:40 AM, Wade Fitzpatrick wrote: >>>> I have twice now tried to configure geo-replication of our >>>> Stripe-Replicate volume to a remote Stripe volume but it always >>>> seems to have issues. >>>> >>>> root at james:~# gluster volume info >>>> >>>> Volume Name: gluster_shared_storage >>>> Type: Replicate >>>> Volume ID: 5f446a10-651b-4ce0-a46b-69871f498dbc >>>> Status: Started >>>> Number of Bricks: 1 x 4 = 4 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: james:/data/gluster1/geo-rep-meta/brick >>>> Brick2: cupid:/data/gluster1/geo-rep-meta/brick >>>> Brick3: hilton:/data/gluster1/geo-rep-meta/brick >>>> Brick4: present:/data/gluster1/geo-rep-meta/brick >>>> Options Reconfigured: >>>> performance.readdir-ahead: on >>>> >>>> Volume Name: static >>>> Type: Striped-Replicate >>>> Volume ID: 3f9f810d-a988-4914-a5ca-5bd7b251a273 >>>> Status: Started >>>> Number of Bricks: 1 x 2 x 2 = 4 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: james:/data/gluster1/static/brick1 >>>> Brick2: cupid:/data/gluster1/static/brick2 >>>> Brick3: hilton:/data/gluster1/static/brick3 >>>> Brick4: present:/data/gluster1/static/brick4 >>>> Options Reconfigured: >>>> auth.allow: 10.x.* >>>> features.scrub: Active >>>> features.bitrot: on >>>> performance.readdir-ahead: on >>>> geo-replication.indexing: on >>>> geo-replication.ignore-pid-check: on >>>> changelog.changelog: on >>>> >>>> root at palace:~# gluster volume info >>>> >>>> Volume Name: static >>>> Type: Stripe >>>> Volume ID: 3de935db-329b-4876-9ca4-a0f8d5f184c3 >>>> Status: Started >>>> Number of Bricks: 1 x 2 = 2 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: palace:/data/gluster1/static/brick1 >>>> Brick2: madonna:/data/gluster1/static/brick2 >>>> Options Reconfigured: >>>> features.scrub: Active >>>> features.bitrot: on >>>> performance.readdir-ahead: on >>>> >>>> root at james:~# gluster vol geo-rep static ssh://gluster-b1::static >>>> status detail >>>> >>>> MASTER NODE MASTER VOL MASTER BRICK SLAVE USER >>>> SLAVE SLAVE NODE STATUS CRAWL >>>> STATUS LAST_SYNCED ENTRY DATA META FAILURES >>>> CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME >>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>>> >>>> james static /data/gluster1/static/brick1 root >>>> ssh://gluster-b1::static palace Active Changelog Crawl >>>> 2015-10-13 14:23:20 0 0 0 1952064 >>>> N/A N/A N/A >>>> hilton static /data/gluster1/static/brick3 root >>>> ssh://gluster-b1::static palace Active Changelog Crawl >>>> N/A 0 0 0 1008035 >>>> N/A N/A N/A >>>> present static /data/gluster1/static/brick4 root >>>> ssh://gluster-b1::static madonna Passive N/A >>>> N/A N/A N/A N/A N/A >>>> N/A N/A N/A >>>> cupid static /data/gluster1/static/brick2 root >>>> ssh://gluster-b1::static madonna Passive N/A >>>> N/A N/A N/A N/A N/A >>>> N/A N/A N/A >>>> >>>> >>>> So just to clarify, data is striped over bricks 1 and 3; bricks 2 >>>> and 4 are the replica. >>>> >>>> Can someone help me diagnose the problem and find a solution? >>>> >>>> Thanks in advance, >>>> Wade. >>>> >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> http://www.gluster.org/mailman/listinfo/gluster-users >>> >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151019/6dd6a019/attachment.html>
I have now tried to re-initialise the whole geo-rep setup but the replication slave went Faulty immediately. Any help here would be appreciated, I cannot even find how to recover a faulty node without recreating the geo-rep. root at james:~# gluster volume geo-replication static gluster-b1::static stop Stopping geo-replication session between static & gluster-b1::static has been successful root at james:~# gluster volume geo-replication static gluster-b1::static delete Deleting geo-replication session between static & gluster-b1::static has been successful I then destroyed the volume and re-created bricks on gluster-b1::static slave volume. root at palace:~# gluster volume stop static Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y volume stop: static: success root at palace:~# gluster volume delete static Deleting volume will erase all information about the volume. Do you want to continue? (y/n) y volume delete: static: success root at palace:~# gluster volume create static stripe 2 transport tcp palace:/data/gluster1/static/brick1 madonna:/data/gluster1/static/brick2 volume create: static: success: please start the volume to access data root at palace:~# gluster volume info Volume Name: static Type: Stripe Volume ID: dc14cd83-2736-4faf-8e11-c6d711ff8f56 Status: Created Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: palace:/data/gluster1/static/brick1 Brick2: madonna:/data/gluster1/static/brick2 Options Reconfigured: performance.readdir-ahead: on root at palace:~# gluster volume start static volume start: static: success Then established the geo-rep sync again root at james:~# gluster volume geo-replication static ssh://gluster-b1::static create Creating geo-replication session between static & ssh://gluster-b1::static has been successful root at james:~# gluster volume geo-replication static ssh://gluster-b1::static config use_meta_volume true geo-replication config updated successfully root at james:~# gluster volume geo-replication static ssh://gluster-b1::static config use-tarssh true geo-replication config updated successfully root at james:~# gluster volume geo-replication static ssh://gluster-b1::static config special_sync_mode: partial state_socket_unencoded: /var/lib/glusterd/geo-replication/static_gluster-b1_static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic.socket gluster_log_file: /var/log/glusterfs/geo-replication/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic.gluster.log ssh_command: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem use_tarssh: true ignore_deletes: false change_detector: changelog gluster_command_dir: /usr/sbin/ state_file: /var/lib/glusterd/geo-replication/static_gluster-b1_static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic.status remote_gsyncd: /nonexistent/gsyncd log_file: /var/log/glusterfs/geo-replication/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic.log changelog_log_file: /var/log/glusterfs/geo-replication/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic-changes.log socketdir: /var/run/gluster working_dir: /var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic state_detail_file: /var/lib/glusterd/geo-replication/static_gluster-b1_static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic-detail.status use_meta_volume: true ssh_command_tar: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem pid_file: /var/lib/glusterd/geo-replication/static_gluster-b1_static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic.pid georep_session_working_dir: /var/lib/glusterd/geo-replication/static_gluster-b1_static/ gluster_params: aux-gfid-mount acl root at james:~# gluster volume geo-replication static ssh://gluster-b1::static start Geo-replication session between static and ssh://gluster-b1::static does not exist. geo-replication command failed root at james:~# gluster volume geo-replication static ssh://gluster-b1::static status detail MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- james static /data/gluster1/static/brick1 root ssh://gluster-b1::static N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A hilton static /data/gluster1/static/brick3 root ssh://gluster-b1::static N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A present static /data/gluster1/static/brick4 root ssh://gluster-b1::static N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A cupid static /data/gluster1/static/brick2 root ssh://gluster-b1::static N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A root at james:~# gluster volume geo-replication static ssh://gluster-b1::static start Starting geo-replication session between static & ssh://gluster-b1::static has been successful root at james:~# gluster volume geo-replication static ssh://gluster-b1::static status detail MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- james static /data/gluster1/static/brick1 root ssh://gluster-b1::static N/A Initializing... N/A N/A N/A N/A N/A N/A N/A N/A N/A hilton static /data/gluster1/static/brick3 root ssh://gluster-b1::static palace Active Hybrid Crawl N/A 0 0 0 0 N/A N/A N/A present static /data/gluster1/static/brick4 root ssh://gluster-b1::static madonna Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A cupid static /data/gluster1/static/brick2 root ssh://gluster-b1::static madonna Active Hybrid Crawl N/A 0 0 0 0 N/A N/A N/A root at james:~# gluster volume geo-replication static ssh://gluster-b1::static status detail MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- james static /data/gluster1/static/brick1 root ssh://gluster-b1::static N/A Faulty N/A N/A N/A N/A N/A N/A N/A N/A N/A hilton static /data/gluster1/static/brick3 root ssh://gluster-b1::static palace Active Hybrid Crawl N/A 8191 8187 0 0 N/A N/A N/A present static /data/gluster1/static/brick4 root ssh://gluster-b1::static madonna Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A cupid static /data/gluster1/static/brick2 root ssh://gluster-b1::static madonna Active Hybrid Crawl N/A 8191 8187 0 0 N/A N/A N/A
Hi Wade, There seems to be some issue in syncing the existing data in the volume using Xsync crawl. ( To give some background: When geo-rep is started it goes to filesystem crawl(Xsync) and sync all the data to slave, and then the session switches to CHANGELOG mode). We are looking in to this. Any specific reason to go for Stripe volume? This seems to be not extensively tested with geo-rep. Thanks, Saravana On 10/19/2015 08:24 AM, Wade Fitzpatrick wrote:> The relevant portions of the log appear to be as follows. Everything > seemed fairly normal (though quite slow) until > > [2015-10-08 15:31:26.471216] I > [master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished > hybrid crawl syncing, stime: (1444278018, 482251) > [2015-10-08 15:31:34.39248] I > [syncdutils(/data/gluster1/static/brick1):220:finalize] <top>: exiting. > [2015-10-08 15:31:34.40934] I [repce(agent):92:service_loop] > RepceServer: terminating on reaching EOF. > [2015-10-08 15:31:34.41220] I [syncdutils(agent):220:finalize] <top>: > exiting. > [2015-10-08 15:31:35.615353] I [monitor(monitor):362:distribute] > <top>: slave bricks: [{'host': 'palace', 'dir': > '/data/gluster1/static/brick1'}, {'host': 'madonna', 'dir' > : '/data/gluster1/static/brick2'}] > [2015-10-08 15:31:35.616558] I [monitor(monitor):383:distribute] > <top>: worker specs: [('/data/gluster1/static/brick1', > 'ssh://root at palace:gluster://localhost:static', 1)] > [2015-10-08 15:31:35.748434] I [monitor(monitor):221:monitor] Monitor: > ------------------------------------------------------------ > [2015-10-08 15:31:35.748775] I [monitor(monitor):222:monitor] Monitor: > starting gsyncd worker > [2015-10-08 15:31:35.837651] I [changelogagent(agent):75:__init__] > ChangelogAgent: Agent listining... > [2015-10-08 15:31:35.841150] I > [gsyncd(/data/gluster1/static/brick1):649:main_i] <top>: syncing: > gluster://localhost:static -> ssh://root at palace:gluster://localhost:static > [2015-10-08 15:31:38.543379] I > [master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: > setting up xsync change detection mode > [2015-10-08 15:31:38.543802] I > [master(/data/gluster1/static/brick1):401:__init__] _GMaster: using > 'tar over ssh' as the sync engine > [2015-10-08 15:31:38.544673] I > [master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: > setting up xsync change detection mode > [2015-10-08 15:31:38.544924] I > [master(/data/gluster1/static/brick1):401:__init__] _GMaster: using > 'tar over ssh' as the sync engine > [2015-10-08 15:31:38.546163] I > [master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: > setting up xsync change detection mode > [2015-10-08 15:31:38.546406] I > [master(/data/gluster1/static/brick1):401:__init__] _GMaster: using > 'tar over ssh' as the sync engine > [2015-10-08 15:31:38.548989] I > [master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync > temp directory: > /var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync > [2015-10-08 15:31:38.549267] I > [master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync > temp directory: > /var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync > [2015-10-08 15:31:38.549467] I > [master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync > temp directory: > /var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync > [2015-10-08 15:31:38.549632] I > [resource(/data/gluster1/static/brick1):1432:service_loop] GLUSTER: > Register time: 1444278698 > [2015-10-08 15:31:38.582277] I > [master(/data/gluster1/static/brick1):530:crawlwrap] _GMaster: primary > master with volume id 3f9f810d-a988-4914-a5ca-5bd7b251a273 ... > [2015-10-08 15:31:38.584099] I > [master(/data/gluster1/static/brick1):539:crawlwrap] _GMaster: crawl > interval: 60 seconds > [2015-10-08 15:31:38.587405] I > [master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting > hybrid crawl..., stime: (1444278018, 482251) > [2015-10-08 15:31:38.588735] I > [master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished > hybrid crawl syncing, stime: (1444278018, 482251) > [2015-10-08 15:31:38.590116] I > [master(/data/gluster1/static/brick1):530:crawlwrap] _GMaster: primary > master with volume id 3f9f810d-a988-4914-a5ca-5bd7b251a273 ... > [2015-10-08 15:31:38.591582] I > [master(/data/gluster1/static/brick1):539:crawlwrap] _GMaster: crawl > interval: 60 seconds > [2015-10-08 15:31:38.593844] I > [master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting > hybrid crawl..., stime: (1444278018, 482251) > [2015-10-08 15:31:38.594832] I > [master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished > hybrid crawl syncing, stime: (1444278018, 482251) > [2015-10-08 15:32:38.641908] I > [master(/data/gluster1/static/brick1):552:crawlwrap] _GMaster: 1 > crawls, 0 turns > [2015-10-08 15:32:38.644370] I > [master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting > hybrid crawl..., stime: (1444278018, 482251) > [2015-10-08 15:32:39.646733] I > [master(/data/gluster1/static/brick1):1252:crawl] _GMaster: processing > xsync changelog > /var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync/XSYNC-CHANGELOG.1444278758 > [2015-10-08 15:32:40.857084] W > [master(/data/gluster1/static/brick1):803:log_failures] _GMaster: > ENTRY FAILED: ({'uid': 0, 'gfid': > 'fc446c88-a5b7-468b-ac52-25b4225fe0cf', 'gid': 0, 'mode': 33188, > 'entry': > '.gfid/e63d740c-ce17-4107-af35-d4030d2ae466/formguide-2015-10-08-gosford-1.html', > 'op': 'MKNOD'}, 17, '02489235-13c5-4232-8d6d-c7843bc5249b') > [2015-10-08 15:32:40.858580] W > [master(/data/gluster1/static/brick1):803:log_failures] _GMaster: > ENTRY FAILED: ({'uid': 0, 'gfid': > 'e08813c5-055a-4354-94ec-f1b41a14b2a4', 'gid': 0, 'mode': 33188, > 'entry': > '.gfid/e63d740c-ce17-4107-af35-d4030d2ae466/formguide-2015-10-08-gosford-2.html', > 'op': 'MKNOD'}, 17, '0abae047-5816-4199-8203-fa8b974dfef5') > > ... > > [2015-10-08 15:33:38.236779] W > [master(/data/gluster1/static/brick1):803:log_failures] _GMaster: > ENTRY FAILED: ({'uid': 1000, 'gfid': > 'a41a2ac7-8fec-46bd-a4cc-8d8794e5ee39', 'gid > ': 1000, 'mode': 33206, 'entry': > '.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/1PYhnxMyMMcQo8ukuyMsqq.png', > 'op': 'MKNOD'}, 17, 'e047db7d-f96c-496f-8a83-5db8e41859ca') > [2015-10-08 15:33:38.237443] W > [master(/data/gluster1/static/brick1):803:log_failures] _GMaster: > ENTRY FAILED: ({'uid': 1000, 'gfid': > '507f77db-0dc0-4d7f-9eb3-8f56b3e01765', 'gid > ': 1000, 'mode': 33206, 'entry': > '.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/17H7rpUIXGEQemM0wCoy6c.png', > 'op': 'MKNOD'}, 17, 'ee7fa964-fc92-4008-b38a-e790fbbb1285') > [2015-10-08 15:33:38.238053] W > [master(/data/gluster1/static/brick1):803:log_failures] _GMaster: > ENTRY FAILED: ({'uid': 1000, 'gfid': > '6c495557-6808-4ff9-98de-39afbbeeac82', 'gid > ': 1000, 'mode': 33206, 'entry': > '.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/3T3VvUQH44my0Eosiieeok.png', > 'op': 'MKNOD'}, 17, 'cc6a75c4-0817-497e-912b-4442fd19db83') > [2015-10-08 15:33:43.615427] W > [master(/data/gluster1/static/brick1):1010:process] _GMaster: > changelogs XSYNC-CHANGELOG.1444278758 could not be processed - moving > on... > [2015-10-08 15:33:43.616425] W > [master(/data/gluster1/static/brick1):1014:process] _GMaster: SKIPPED > GFID = > 6c495557-6808-4ff9-98de-39afbbeeac82,16f94158-2f27-421b-9981-94d4197b2b3b,53d01d46-5724-4c77-846f-aacea7a3a447,9fbb536b-b7c6-41e1-8593-43e8a42b3fbe,1923ceff-d9a4-449e-b1c6-ce37c54d242c,3206332f-ed48-48d7-ad3f-cb82fbda0695,7696c570-edd5-481e-8cdc-3e...[truncated] > > > That type of entry repeats until > > [2015-10-09 11:12:22.590574] I > [master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished > hybrid crawl syncing, stime: (1444349280, 617969) > [2015-10-09 11:13:22.650285] I > [master(/data/gluster1/static/brick1):552:crawlwrap] _GMaster: 1 > crawls, 1 turns > [2015-10-09 11:13:22.653459] I > [master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting > hybrid crawl..., stime: (1444349280, 617969) > [2015-10-09 11:13:22.670430] W > [master(/data/gluster1/static/brick1):1366:Xcrawl] _GMaster: irregular > xtime for > ./racesoap/nominations/processed/.processed.2015-10-13.T.Ballina.V1.nomination.1444346457.247.Thj1Ly: > ENOENT > > and then there were no more logs until 2015-10-13. > > Thanks, > Wade. > > On 16/10/2015 4:33 pm, Aravinda wrote: >> Oh ok. I overlooked the status output. Please share the >> geo-replication logs from "james" and "hilton" nodes. >> >> regards >> Aravinda >> >> On 10/15/2015 05:55 PM, Wade Fitzpatrick wrote: >>> Well I'm kind of worried about the 3 million failures listed in the >>> FAILURES column, the timestamp showing that syncing "stalled" 2 days >>> ago and the fact that only half of the files have been transferred >>> to the remote volume. >>> >>> On 15/10/2015 9:27 pm, Aravinda wrote: >>>> Status looks good. Two master bricks are Active and participating >>>> in syncing. Please let us know the issue you are observing. >>>> regards >>>> Aravinda >>>> On 10/15/2015 11:40 AM, Wade Fitzpatrick wrote: >>>>> I have twice now tried to configure geo-replication of our >>>>> Stripe-Replicate volume to a remote Stripe volume but it always >>>>> seems to have issues. >>>>> >>>>> root at james:~# gluster volume info >>>>> >>>>> Volume Name: gluster_shared_storage >>>>> Type: Replicate >>>>> Volume ID: 5f446a10-651b-4ce0-a46b-69871f498dbc >>>>> Status: Started >>>>> Number of Bricks: 1 x 4 = 4 >>>>> Transport-type: tcp >>>>> Bricks: >>>>> Brick1: james:/data/gluster1/geo-rep-meta/brick >>>>> Brick2: cupid:/data/gluster1/geo-rep-meta/brick >>>>> Brick3: hilton:/data/gluster1/geo-rep-meta/brick >>>>> Brick4: present:/data/gluster1/geo-rep-meta/brick >>>>> Options Reconfigured: >>>>> performance.readdir-ahead: on >>>>> >>>>> Volume Name: static >>>>> Type: Striped-Replicate >>>>> Volume ID: 3f9f810d-a988-4914-a5ca-5bd7b251a273 >>>>> Status: Started >>>>> Number of Bricks: 1 x 2 x 2 = 4 >>>>> Transport-type: tcp >>>>> Bricks: >>>>> Brick1: james:/data/gluster1/static/brick1 >>>>> Brick2: cupid:/data/gluster1/static/brick2 >>>>> Brick3: hilton:/data/gluster1/static/brick3 >>>>> Brick4: present:/data/gluster1/static/brick4 >>>>> Options Reconfigured: >>>>> auth.allow: 10.x.* >>>>> features.scrub: Active >>>>> features.bitrot: on >>>>> performance.readdir-ahead: on >>>>> geo-replication.indexing: on >>>>> geo-replication.ignore-pid-check: on >>>>> changelog.changelog: on >>>>> >>>>> root at palace:~# gluster volume info >>>>> >>>>> Volume Name: static >>>>> Type: Stripe >>>>> Volume ID: 3de935db-329b-4876-9ca4-a0f8d5f184c3 >>>>> Status: Started >>>>> Number of Bricks: 1 x 2 = 2 >>>>> Transport-type: tcp >>>>> Bricks: >>>>> Brick1: palace:/data/gluster1/static/brick1 >>>>> Brick2: madonna:/data/gluster1/static/brick2 >>>>> Options Reconfigured: >>>>> features.scrub: Active >>>>> features.bitrot: on >>>>> performance.readdir-ahead: on >>>>> >>>>> root at james:~# gluster vol geo-rep static ssh://gluster-b1::static >>>>> status detail >>>>> >>>>> MASTER NODE MASTER VOL MASTER BRICK SLAVE USER >>>>> SLAVE SLAVE NODE STATUS CRAWL >>>>> STATUS LAST_SYNCED ENTRY DATA META FAILURES >>>>> CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME >>>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>> >>>>> james static /data/gluster1/static/brick1 root >>>>> ssh://gluster-b1::static palace Active Changelog Crawl >>>>> 2015-10-13 14:23:20 0 0 0 1952064 N/A >>>>> N/A N/A >>>>> hilton static /data/gluster1/static/brick3 root >>>>> ssh://gluster-b1::static palace Active Changelog Crawl >>>>> N/A 0 0 0 1008035 >>>>> N/A N/A N/A >>>>> present static /data/gluster1/static/brick4 root >>>>> ssh://gluster-b1::static madonna Passive N/A >>>>> N/A N/A N/A N/A N/A N/A >>>>> N/A N/A >>>>> cupid static /data/gluster1/static/brick2 root >>>>> ssh://gluster-b1::static madonna Passive N/A >>>>> N/A N/A N/A N/A N/A N/A >>>>> N/A N/A >>>>> >>>>> >>>>> So just to clarify, data is striped over bricks 1 and 3; bricks 2 >>>>> and 4 are the replica. >>>>> >>>>> Can someone help me diagnose the problem and find a solution? >>>>> >>>>> Thanks in advance, >>>>> Wade. >>>>> >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>> >>> >>> >> > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151019/72ef3526/attachment.html>