thr3ads.net - Gluster users - [Gluster-users] geo-replication {error=12} on one primary node [Feb 2024]

If this information is useful, please help other people find it:
Share via:

Stefan Kania

2024-Feb-13 19:11 UTC

[Gluster-users] geo-replication {error=12} on one primary node

Hi to all,

Yes, I saw that there is a thread about geo-replication with nearly the 
same problem, I read it, but I think my problem is a bit different.

I created two volumes the primary volume "privol01" and the secondary 
volume "secvol01". All hosts are having the same packages installed,
all
hosts are debian12 with gluster version 10.05. So  even rsync is the 
same on any of the hosts. (I installed one host (vm) and clone it).
I have:
  Volume Name: privol01
Type: Replicate
Volume ID: 93ace064-2862-41fe-9606-af5a4af9f5ab
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: p01:/gluster/brick
Brick2: p02:/gluster/brick
Brick3: p03:/gluster/brick

and:

Volume Name: secvol01
Type: Replicate
Volume ID: 4ebb7768-51da-446c-a301-dc3ea49a9ba2
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: s01:/gluster/brick
Brick2: s02:/gluster/brick
Brick3: s03:/gluster/brick

resolving the names of the hosts is working in any direction

that's what I did:
on all secondary hosts:

groupadd geogruppe
useradd -G geogruppe -m geobenutzer
passwd geobenutzer
ln -s /usr/sbin/gluster /usr/bin

on one of the secondary hosts:
gluster-mountbroker setup /var/mountbroker geogruppe

gluster-mountbroker add secvol01 geobenutzer

on one of the primary hosts:
ssh-keygen

ssh-copy-id geobenutzer at s01.gluster

gluster-georep-sshkey generate

gluster v geo-replication privol01 geobenutzer at s01.gluster::secvol01 
create push-pem


on one of the secondary hosts:
/usr/libexec/glusterfs/set_geo_rep_pem_keys.sh

All the commands exited with out an error message.

Restarted glusterd on all nodes

then on the primary host:
gluster volume geo-replication privol01 
geobenutzer at s01.gluster::secvol01 start

The status is showing:

PRIMARY NODE    PRIMARY VOL    PRIMARY BRICK     SECONDARY USER 
SECONDARY                            SECONDARY NODE    STATUS     CRAWL 
STATUS    LAST_SYNCED
---------------------------------------------------------------------------------------------------------------------------------------------------------------
p03             privol01       /gluster/brick    geobenutzer 
geobenutzer at s01.gluster::secvol01                      Passive    N/A 
          N/A
p02             privol01       /gluster/brick    geobenutzer 
geobenutzer at s01.gluster::secvol01                      Passive    N/A 
          N/A
p01             privol01       /gluster/brick    geobenutzer 
geobenutzer at s01.gluster::secvol01    N/A               Faulty     N/A 
          N/A

For p01 the status is changing from "Initializing... to"
"status=Active
status=History Crawl" to status=Faulty and then back to Initializing

But only for the primary host p01.

Here is the lock from p01:
--------------------------------
[2024-02-13 18:30:06.64585] I 
[gsyncdstatus(monitor):247:set_worker_status] GeorepStatus: Worker 
Status Change [{status=Initializing...}]
[2024-02-13 18:30:06.65004] I [monitor(monitor):158:monitor] Monitor: 
starting gsyncd worker [{brick=/gluster/brick}, {secondary_node=s01}]
[2024-02-13 18:30:06.147194] I [resource(worker 
/gluster/brick):1387:connect_remote] SSH: Initializing SSH connection 
between primary and secondary...
[2024-02-13 18:30:07.777785] I [resource(worker 
/gluster/brick):1435:connect_remote] SSH: SSH connection between primary 
and secondary established. [{duration=1.6304}]
[2024-02-13 18:30:07.777971] I [resource(worker 
/gluster/brick):1116:connect] GLUSTER: Mounting gluster volume locally...
[2024-02-13 18:30:08.822077] I [resource(worker 
/gluster/brick):1138:connect] GLUSTER: Mounted gluster volume 
[{duration=1.0438}]
[2024-02-13 18:30:08.823039] I [subcmds(worker 
/gluster/brick):84:subcmd_worker] <top>: Worker spawn successful. 
Acknowledging back to monitor
[2024-02-13 18:30:10.861742] I [primary(worker 
/gluster/brick):1661:register] _GPrimary: Working dir 
[{path=/var/lib/misc/gluster/gsyncd/privol01_s01.gluster_secvol01/gluster-brick}]
[2024-02-13 18:30:10.864432] I [resource(worker 
/gluster/brick):1291:service_loop] GLUSTER: Register time 
[{time=1707849010}]
[2024-02-13 18:30:10.906805] I [gsyncdstatus(worker 
/gluster/brick):280:set_active] GeorepStatus: Worker Status Change 
[{status=Active}]
[2024-02-13 18:30:11.7656] I [gsyncdstatus(worker 
/gluster/brick):252:set_worker_crawl_status] GeorepStatus: Crawl Status 
Change [{status=History Crawl}]
[2024-02-13 18:30:11.7984] I [primary(worker /gluster/brick):1572:crawl] 
_GPrimary: starting history crawl [{turns=1}, {stime=(1707848760, 0)}, 
{etime=1707849011}, {entry_stime=None}]
[2024-02-13 18:30:12.9234] I [primary(worker /gluster/brick):1604:crawl] 
_GPrimary: secondary's time [{stime=(1707848760, 0)}]
[2024-02-13 18:30:12.388528] I [primary(worker 
/gluster/brick):2009:syncjob] Syncer: Sync Time Taken [{job=2}, 
{num_files=2}, {return_code=12}, {duration=0.0520}]
[2024-02-13 18:30:12.388745] E [syncdutils(worker 
/gluster/brick):845:errlog] Popen: command returned error [{cmd=rsync 
-aR0 --inplace --files-from=- --super --stats --numeric-ids 
--no-implied-dirs --existing --xattrs --acls --ignore-missing-args . -e 
ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i 
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto 
-S /tmp/gsyncd-aux-ssh-1_kow1tp/c343d8e67535166a0d66b71865f3f3c4.sock 
geobenutzer at s01:/proc/2675/cwd}, {error=12}]
[2024-02-13 18:30:12.826546] I [monitor(monitor):227:monitor] Monitor: 
worker died in startup phase [{brick=/gluster/brick}]
[2024-02-13 18:30:12.845687] I 
[gsyncdstatus(monitor):247:set_worker_status] GeorepStatus: Worker 
Status Change [{status=Faulty}]
---------------------

The host p01 is trying to connect to s01

A look at host p02 of the primary volume is showing:
-------------------
[2024-02-13 18:25:55.179385] I 
[gsyncdstatus(monitor):247:set_worker_status] GeorepStatus: Worker 
Status Change [{status=Initializing...}]
[2024-02-13 18:25:55.179572] I [monitor(monitor):158:monitor] Monitor: 
starting gsyncd worker [{brick=/gluster/brick}, {secondary_node=s01}]
[2024-02-13 18:25:55.258658] I [resource(worker 
/gluster/brick):1387:connect_remote] SSH: Initializing SSH connection 
between primary and secondary...
[2024-02-13 18:25:57.78159] I [resource(worker 
/gluster/brick):1435:connect_remote] SSH: SSH connection between primary 
and secondary established. [{duration=1.8194}]
[2024-02-13 18:25:57.78254] I [resource(worker 
/gluster/brick):1116:connect] GLUSTER: Mounting gluster volume locally...
[2024-02-13 18:25:58.123291] I [resource(worker 
/gluster/brick):1138:connect] GLUSTER: Mounted gluster volume 
[{duration=1.0450}]
[2024-02-13 18:25:58.123410] I [subcmds(worker 
/gluster/brick):84:subcmd_worker] <top>: Worker spawn successful. 
Acknowledging back to monitor
[2024-02-13 18:26:00.135934] I [primary(worker 
/gluster/brick):1661:register] _GPrimary: Working dir 
[{path=/var/lib/misc/gluster/gsyncd/privol01_s01.gluster_secvol01/gluster-brick}]
[2024-02-13 18:26:00.136287] I [resource(worker 
/gluster/brick):1291:service_loop] GLUSTER: Register time 
[{time=1707848760}]
[2024-02-13 18:26:00.179157] I [gsyncdstatus(worker 
/gluster/brick):286:set_passive] GeorepStatus: Worker Status Change 
[{status=Passive}]
------------------
This is primary node is also connecting to s01 and it works.

It must have something to do with the primary host, because if I stop 
the replication and restart it, the primary host is triying to connect 
to a different secondary host with the same error:

----------------
Popen: command returned error [{cmd=rsync -aR0 --inplace --files-from=- 
--super --stats --numeric-ids --no-implied-dirs --existing --xattrs 
--acls --ignore-missing-args . -e ssh -oPasswordAuthentication=no 
-oStrictHostKeyChecking=no -i 
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto 
-S /tmp/gsyncd-aux-ssh-1_kow1tp/c343d8e67535166a0d66b71865f3f3c4.sock 
geobenutzer at s01:/proc/2675/cwd}, {error=12}]
----------------

So the problem must be the primary host p01. That's the host I 
configured the passwordless ssh-session.

This is is test-setup I also tried it before with two other volumes with 
6 Nodes each. There I had 2 faulty nodes in the primary volume.

I can start and stop the replication session from any of the primary 
nodes but always p01 is faulty.


Any help ?

Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3477 bytes
Desc: Kryptografische S/MIME-Signatur
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20240213/d43a1a5d/attachment.p7s>

Anant Saraswat

2024-Feb-13 19:32 UTC

head link

[Gluster-users] geo-replication {error=12} on one primary node

Hi @Stefan Kania<mailto:stefan at kania-online.de>,

Please try to enable the geo-replication debug logs using the following command
on the primary server, and recheck or resend the logs.

gluster volume geo-replication privol01 geobenutzer at s01.gluster::secvol01
config log-level DEBUG?

Thanks,
Anant

________________________________
From: Gluster-users <gluster-users-bounces at gluster.org> on behalf of
Stefan Kania <stefan at kania-online.de>
Sent: 13 February 2024 7:11 PM
To: gluster-users at gluster.org <gluster-users at gluster.org>
Subject: [Gluster-users] geo-replication {error=12} on one primary node

EXTERNAL: Do not click links or open attachments if you do not recognize the
sender.

Hi to all,

Yes, I saw that there is a thread about geo-replication with nearly the
same problem, I read it, but I think my problem is a bit different.

I created two volumes the primary volume "privol01" and the secondary
volume "secvol01". All hosts are having the same packages installed,
all
hosts are debian12 with gluster version 10.05. So  even rsync is the
same on any of the hosts. (I installed one host (vm) and clone it).
I have:
  Volume Name: privol01
Type: Replicate
Volume ID: 93ace064-2862-41fe-9606-af5a4af9f5ab
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: p01:/gluster/brick
Brick2: p02:/gluster/brick
Brick3: p03:/gluster/brick

and:

Volume Name: secvol01
Type: Replicate
Volume ID: 4ebb7768-51da-446c-a301-dc3ea49a9ba2
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: s01:/gluster/brick
Brick2: s02:/gluster/brick
Brick3: s03:/gluster/brick

resolving the names of the hosts is working in any direction

that's what I did:
on all secondary hosts:

groupadd geogruppe
useradd -G geogruppe -m geobenutzer
passwd geobenutzer
ln -s /usr/sbin/gluster /usr/bin

on one of the secondary hosts:
gluster-mountbroker setup /var/mountbroker geogruppe

gluster-mountbroker add secvol01 geobenutzer

on one of the primary hosts:
ssh-keygen

ssh-copy-id geobenutzer at s01.gluster

gluster-georep-sshkey generate

gluster v geo-replication privol01 geobenutzer at s01.gluster::secvol01
create push-pem


on one of the secondary hosts:
/usr/libexec/glusterfs/set_geo_rep_pem_keys.sh

All the commands exited with out an error message.

Restarted glusterd on all nodes

then on the primary host:
gluster volume geo-replication privol01
geobenutzer at s01.gluster::secvol01 start

The status is showing:

PRIMARY NODE    PRIMARY VOL    PRIMARY BRICK     SECONDARY USER
SECONDARY                            SECONDARY NODE    STATUS     CRAWL
STATUS    LAST_SYNCED
---------------------------------------------------------------------------------------------------------------------------------------------------------------
p03             privol01       /gluster/brick    geobenutzer
geobenutzer at s01.gluster::secvol01                      Passive    N/A
          N/A
p02             privol01       /gluster/brick    geobenutzer
geobenutzer at s01.gluster::secvol01                      Passive    N/A
          N/A
p01             privol01       /gluster/brick    geobenutzer
geobenutzer at s01.gluster::secvol01    N/A               Faulty     N/A
          N/A

For p01 the status is changing from "Initializing... to"
"status=Active
status=History Crawl" to status=Faulty and then back to Initializing

But only for the primary host p01.

Here is the lock from p01:
--------------------------------
[2024-02-13 18:30:06.64585] I
[gsyncdstatus(monitor):247:set_worker_status] GeorepStatus: Worker
Status Change [{status=Initializing...}]
[2024-02-13 18:30:06.65004] I [monitor(monitor):158:monitor] Monitor:
starting gsyncd worker [{brick=/gluster/brick}, {secondary_node=s01}]
[2024-02-13 18:30:06.147194] I [resource(worker
/gluster/brick):1387:connect_remote] SSH: Initializing SSH connection
between primary and secondary...
[2024-02-13 18:30:07.777785] I [resource(worker
/gluster/brick):1435:connect_remote] SSH: SSH connection between primary
and secondary established. [{duration=1.6304}]
[2024-02-13 18:30:07.777971] I [resource(worker
/gluster/brick):1116:connect] GLUSTER: Mounting gluster volume locally...
[2024-02-13 18:30:08.822077] I [resource(worker
/gluster/brick):1138:connect] GLUSTER: Mounted gluster volume
[{duration=1.0438}]
[2024-02-13 18:30:08.823039] I [subcmds(worker
/gluster/brick):84:subcmd_worker] <top>: Worker spawn successful.
Acknowledging back to monitor
[2024-02-13 18:30:10.861742] I [primary(worker
/gluster/brick):1661:register] _GPrimary: Working dir
[{path=/var/lib/misc/gluster/gsyncd/privol01_s01.gluster_secvol01/gluster-brick}]
[2024-02-13 18:30:10.864432] I [resource(worker
/gluster/brick):1291:service_loop] GLUSTER: Register time
[{time=1707849010}]
[2024-02-13 18:30:10.906805] I [gsyncdstatus(worker
/gluster/brick):280:set_active] GeorepStatus: Worker Status Change
[{status=Active}]
[2024-02-13 18:30:11.7656] I [gsyncdstatus(worker
/gluster/brick):252:set_worker_crawl_status] GeorepStatus: Crawl Status
Change [{status=History Crawl}]
[2024-02-13 18:30:11.7984] I [primary(worker /gluster/brick):1572:crawl]
_GPrimary: starting history crawl [{turns=1}, {stime=(1707848760, 0)},
{etime=1707849011}, {entry_stime=None}]
[2024-02-13 18:30:12.9234] I [primary(worker /gluster/brick):1604:crawl]
_GPrimary: secondary's time [{stime=(1707848760, 0)}]
[2024-02-13 18:30:12.388528] I [primary(worker
/gluster/brick):2009:syncjob] Syncer: Sync Time Taken [{job=2},
{num_files=2}, {return_code=12}, {duration=0.0520}]
[2024-02-13 18:30:12.388745] E [syncdutils(worker
/gluster/brick):845:errlog] Popen: command returned error [{cmd=rsync
-aR0 --inplace --files-from=- --super --stats --numeric-ids
--no-implied-dirs --existing --xattrs --acls --ignore-missing-args . -e
ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto
-S /tmp/gsyncd-aux-ssh-1_kow1tp/c343d8e67535166a0d66b71865f3f3c4.sock
geobenutzer at s01:/proc/2675/cwd}, {error=12}]
[2024-02-13 18:30:12.826546] I [monitor(monitor):227:monitor] Monitor:
worker died in startup phase [{brick=/gluster/brick}]
[2024-02-13 18:30:12.845687] I
[gsyncdstatus(monitor):247:set_worker_status] GeorepStatus: Worker
Status Change [{status=Faulty}]
---------------------

The host p01 is trying to connect to s01

A look at host p02 of the primary volume is showing:
-------------------
[2024-02-13 18:25:55.179385] I
[gsyncdstatus(monitor):247:set_worker_status] GeorepStatus: Worker
Status Change [{status=Initializing...}]
[2024-02-13 18:25:55.179572] I [monitor(monitor):158:monitor] Monitor:
starting gsyncd worker [{brick=/gluster/brick}, {secondary_node=s01}]
[2024-02-13 18:25:55.258658] I [resource(worker
/gluster/brick):1387:connect_remote] SSH: Initializing SSH connection
between primary and secondary...
[2024-02-13 18:25:57.78159] I [resource(worker
/gluster/brick):1435:connect_remote] SSH: SSH connection between primary
and secondary established. [{duration=1.8194}]
[2024-02-13 18:25:57.78254] I [resource(worker
/gluster/brick):1116:connect] GLUSTER: Mounting gluster volume locally...
[2024-02-13 18:25:58.123291] I [resource(worker
/gluster/brick):1138:connect] GLUSTER: Mounted gluster volume
[{duration=1.0450}]
[2024-02-13 18:25:58.123410] I [subcmds(worker
/gluster/brick):84:subcmd_worker] <top>: Worker spawn successful.
Acknowledging back to monitor
[2024-02-13 18:26:00.135934] I [primary(worker
/gluster/brick):1661:register] _GPrimary: Working dir
[{path=/var/lib/misc/gluster/gsyncd/privol01_s01.gluster_secvol01/gluster-brick}]
[2024-02-13 18:26:00.136287] I [resource(worker
/gluster/brick):1291:service_loop] GLUSTER: Register time
[{time=1707848760}]
[2024-02-13 18:26:00.179157] I [gsyncdstatus(worker
/gluster/brick):286:set_passive] GeorepStatus: Worker Status Change
[{status=Passive}]
------------------
This is primary node is also connecting to s01 and it works.

It must have something to do with the primary host, because if I stop
the replication and restart it, the primary host is triying to connect
to a different secondary host with the same error:

----------------
Popen: command returned error [{cmd=rsync -aR0 --inplace --files-from=-
--super --stats --numeric-ids --no-implied-dirs --existing --xattrs
--acls --ignore-missing-args . -e ssh -oPasswordAuthentication=no
-oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto
-S /tmp/gsyncd-aux-ssh-1_kow1tp/c343d8e67535166a0d66b71865f3f3c4.sock
geobenutzer at s01:/proc/2675/cwd}, {error=12}]
----------------

So the problem must be the primary host p01. That's the host I
configured the passwordless ssh-session.

This is is test-setup I also tried it before with two other volumes with
6 Nodes each. There I had 2 faulty nodes in the primary volume.

I can start and stop the replication session from any of the primary
nodes but always p01 is faulty.


Any help ?

Stefan

DISCLAIMER: This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they are
addressed. If you have received this email in error, please notify the sender.
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee, you should not
disseminate, distribute or copy this email. Please notify the sender immediately
by email if you have received this email by mistake and delete this email from
your system.

If you are not the intended recipient, you are notified that disclosing,
copying, distributing or taking any action in reliance on the contents of this
information is strictly prohibited. Thanks for your cooperation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20240213/a0fecb0c/attachment.html>

Gluster users - Feb 2024 - geo-replication {error=12} on one primary node

[Gluster-users] geo-replication {error=12} on one primary node

[Gluster-users] geo-replication {error=12} on one primary node