thr3ads.net - Gluster users - [Gluster-users] gluster peer probe error (v3.6.2) [Mar 2015]

If this information is useful, please help other people find it:
Share via:

Andreas Hollaus

2015-Mar-22 13:41 UTC

[Gluster-users] gluster peer probe error (v3.6.2)

Hi,

I hope that these are the logs that you requested.

Logs from 10.32.0.48:
------------------------------
# more /var/log/glusterfs/.cmd_log_history
[2015-03-19 13:52:03.277438]  : peer probe 10.32.1.144 : FAILED : Probe returned
 with unknown errno -1

# more /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
[2015-03-19 13:41:31.241768] I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/s
bin/glusterd: Started running /usr/sbin/glusterd version 3.6.2 (args: /usr/sbin/
glusterd -p /var/run/glusterd.pid)
[2015-03-19 13:41:31.245352] I [glusterd.c:1214:init] 0-management: Maximum allo
wed open file descriptors set to 65536
[2015-03-19 13:41:31.245432] I [glusterd.c:1259:init] 0-management: Using /var/l
ib/glusterd as working directory
[2015-03-19 13:41:31.247826] I [glusterd-store.c:2063:glusterd_restore_op_versio
n] 0-management: Detected new install. Setting op-version to maximum : 30600
[2015-03-19 13:41:31.247902] I [glusterd-store.c:3497:glusterd_store_retrieve_mi
ssed_snaps_list] 0-management: No missed snaps list.
Final graph:
+------------------------------------------------------------------------------+
  1: volume management
  2:     type mgmt/glusterd
  3:     option rpc-auth.auth-glusterfs on
  4:     option rpc-auth.auth-unix on
  5:     option rpc-auth.auth-null on
  6:     option transport.socket.listen-backlog 128
  7:     option ping-timeout 30
  8:     option transport.socket.read-fail-log off
  9:     option transport.socket.keepalive-interval 2
 10:     option transport.socket.keepalive-time 10
 11:     option transport-type socket
 12:     option working-directory /var/lib/glusterd
 13: end-volume
 14: 
+------------------------------------------------------------------------------+
[2015-03-19 13:42:02.258403] I [glusterd-handler.c:1015:__glusterd_handle_cli_pr
obe] 0-glusterd: Received CLI probe req 10.32.1.144 24007
[2015-03-19 13:42:02.259456] I [glusterd-handler.c:3165:glusterd_probe_begin] 0-
glusterd: Unable to find peerinfo for host: 10.32.1.144 (24007)
[2015-03-19 13:42:02.259664] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-manag
ement: setting frame-timeout to 600
[2015-03-19 13:42:02.260488] I [glusterd-handler.c:3098:glusterd_friend_add] 0-m
anagement: connect returned 0
[2015-03-19 13:42:02.270316] I [glusterd.c:176:glusterd_uuid_generate_save] 0-ma
nagement: generated UUID: 4441e237-89d6-4cdf-a212-f17ecb953b58
[2015-03-19 13:42:02.273427] I [glusterd-rpc-ops.c:244:__glusterd_probe_cbk] 0-m
anagement: Received probe resp from uuid: 82cdb873-28cc-4ed0-8cfe-2b6275770429,
host: 10.32.1.144
[2015-03-19 13:42:02.273681] I [glusterd-rpc-ops.c:386:__glusterd_probe_cbk] 0-g
lusterd: Received resp to probe req
[2015-03-19 13:42:02.278863] I [glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_
versions_ack] 0-management: using the op-version 30600
[2015-03-19 13:52:03.277422] E [rpc-clnt.c:201:call_bail] 0-management: bailing
out frame type(Peer mgmt) op(--(2)) xid = 0x6 sent = 2015-03-19 13:42:02.273482.
 timeout = 600 for 10.32.1.144:24007
[2015-03-19 13:52:03.277453] I [socket.c:3366:socket_submit_reply] 0-socket.mana
gement: not connected (priv->connected = 255)
[2015-03-19 13:52:03.277468] E [rpcsvc.c:1247:rpcsvc_submit_generic] 0-rpc-servi
ce: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2,
Proc: 1) to rpc-transport (socket.management)
[2015-03-19 13:52:03.277483] E [glusterd-utils.c:387:glusterd_submit_reply] 0-:
Reply submission failed



Logs from 10.32.1.144:
---------------------------------
# more ./.cmd_log_history

# more ./etc-glusterfs-glusterd.vol.log
[1970-01-01 00:00:53.225739] I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/s
bin/glusterd: Started running /usr/sbin/glusterd version 3.6.2 (args: /usr/sbin/
glusterd -p /var/run/glusterd.pid)
[1970-01-01 00:00:53.229222] I [glusterd.c:1214:init] 0-management: Maximum allo
wed open file descriptors set to 65536
[1970-01-01 00:00:53.229301] I [glusterd.c:1259:init] 0-management: Using /var/l
ib/glusterd as working directory
[1970-01-01 00:00:53.231653] I [glusterd-store.c:2063:glusterd_restore_op_versio
n] 0-management: Detected new install. Setting op-version to maximum : 30600
[1970-01-01 00:00:53.231730] I [glusterd-store.c:3497:glusterd_store_retrieve_mi
ssed_snaps_list] 0-management: No missed snaps list.
Final graph:
+------------------------------------------------------------------------------+
  1: volume management
  2:     type mgmt/glusterd
  3:     option rpc-auth.auth-glusterfs on
  4:     option rpc-auth.auth-unix on
  5:     option rpc-auth.auth-null on
  6:     option transport.socket.listen-backlog 128
  7:     option ping-timeout 30
  8:     option transport.socket.read-fail-log off
  9:     option transport.socket.keepalive-interval 2
 10:     option transport.socket.keepalive-time 10
 11:     option transport-type socket
 12:     option working-directory /var/lib/glusterd
 13: end-volume
 14: 
+------------------------------------------------------------------------------+
[1970-01-01 00:01:24.417689] I [glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_
versions_ack] 0-management: using the op-version 30600
[1970-01-01 00:01:24.417736] I [glusterd.c:176:glusterd_uuid_generate_save] 0-ma
nagement: generated UUID: 82cdb873-28cc-4ed0-8cfe-2b6275770429
[1970-01-01 00:01:24.420067] I [glusterd-handler.c:2523:__glusterd_handle_probe_
query] 0-glusterd: Received probe from uuid: 4441e237-89d6-4cdf-a212-f17ecb953b5
8
[1970-01-01 00:01:24.420158] I [glusterd-handler.c:2551:__glusterd_handle_probe_
query] 0-glusterd: Unable to find peerinfo for host: 10.32.0.48 (24007)
[1970-01-01 00:01:24.420379] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-manag
ement: setting frame-timeout to 600
[1970-01-01 00:01:24.421140] I [glusterd-handler.c:3098:glusterd_friend_add] 0-m
anagement: connect returned 0
[1970-01-01 00:01:24.421167] I [glusterd-handler.c:2575:__glusterd_handle_probe_
query] 0-glusterd: Responded to 10.32.0.48, op_ret: 0, op_errno: 0, ret: 0
[1970-01-01 00:01:24.422991] I [glusterd-handler.c:2216:__glusterd_handle_incomi
ng_friend_req] 0-glusterd: Received probe from uuid: 4441e237-89d6-4cdf-a212-f17
ecb953b58
[1970-01-01 00:01:24.423024] E [glusterd-utils.c:5760:glusterd_compare_friend_da
ta] 0-management: Importing global options failed
[1970-01-01 00:01:24.423036] E [glusterd-sm.c:1078:glusterd_friend_sm] 0-gluster
d: handler returned: -2
 

Regards
Andreas


On 03/22/15 07:33, Atin Mukherjee wrote:>
> On 03/22/2015 12:09 AM, Andreas Hollaus wrote:
>> Hi,
>>
>> I get a strange result when I execute 'gluster peer probe'. The
command hangs and
>> seems to timeout without any message (I can ping the address):
>> # gluster peer probe 10.32.1.144
>> # echo $?
>> 146
> Could you provide the glusterd log and .cmd_log_history for all the
> nodes in the cluster?
>> The status looks promising, but there's a differences between this
output and what
>> you normally get from a successful call:
>> # gluster peer status
>> Number of Peers: 1
>>
>> Hostname: 10.32.1.144
>> Uuid: 0b008d3e-c51b-4243-ad19-c79c869ba9f2
>> State: Probe Sent to Peer (Connected)
>>
>> (instead of 'State: Peer in Cluster (Connected)')
>>
>> Running the command again will tell you that it is connected:
>>
>> # gluster peer probe 10.32.1.144
>> peer probe: success. Host 10.32.1.144 port 24007 already in peer list
> This means that this peer was added locally but peer handshake was not
> completed for previous peer probe transaction. I would be interested to
> see the logs and then can comment on what went wrong.
>> But when you try to add a brick from that server it fails:
>>
>> # gluster volume add-brick c_test replica 2 10.32.1.144:/opt/lvmdir/c2
force
>> volume add-brick: failed: Host 10.32.1.144 is not in 'Peer in
Cluster' state
>>
>> The volume was previously created using the following commands:
>> # gluster volume create c_test 10.32.0.48:/opt/lvmdir/c2 force
>> volume create: c_test: success: please start the volume to access data
>> # gluster volume start c_test
>> volume start: c_test: success
>>
>> What could be the reason for this problem?
>>
>>
>> Regards
>> Andreas
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>

-- 
Mvh Andreas Hollaus

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
EAB(/[A-Z]+)+                                   Email:  Andreas.Hollaus at
ericsson.com
Ericsson AB                                     Phone:  +46 10 7152961
Isafjordsgatan 10, S-164 80 Stockholm, Sweden           +46 73 0523760

Atin Mukherjee

2015-Mar-23 04:34 UTC

head link

[Gluster-users] gluster peer probe error (v3.6.2)

On 03/22/2015 07:11 PM, Andreas Hollaus wrote:> Hi,
> 
> I hope that these are the logs that you requested.
> 
> Logs from 10.32.0.48:
> ------------------------------
> # more /var/log/glusterfs/.cmd_log_history
> [2015-03-19 13:52:03.277438]  : peer probe 10.32.1.144 : FAILED : Probe
returned
>  with unknown errno -1
> 
> # more /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
> [2015-03-19 13:41:31.241768] I [MSGID: 100030] [glusterfsd.c:2018:main]
0-/usr/s
> bin/glusterd: Started running /usr/sbin/glusterd version 3.6.2 (args:
/usr/sbin/
> glusterd -p /var/run/glusterd.pid)
> [2015-03-19 13:41:31.245352] I [glusterd.c:1214:init] 0-management: Maximum
allo
> wed open file descriptors set to 65536
> [2015-03-19 13:41:31.245432] I [glusterd.c:1259:init] 0-management: Using
/var/l
> ib/glusterd as working directory
> [2015-03-19 13:41:31.247826] I
[glusterd-store.c:2063:glusterd_restore_op_versio
> n] 0-management: Detected new install. Setting op-version to maximum :
30600
> [2015-03-19 13:41:31.247902] I
[glusterd-store.c:3497:glusterd_store_retrieve_mi
> ssed_snaps_list] 0-management: No missed snaps list.
> Final graph:
>
+------------------------------------------------------------------------------+
>   1: volume management
>   2:     type mgmt/glusterd
>   3:     option rpc-auth.auth-glusterfs on
>   4:     option rpc-auth.auth-unix on
>   5:     option rpc-auth.auth-null on
>   6:     option transport.socket.listen-backlog 128
>   7:     option ping-timeout 30
>   8:     option transport.socket.read-fail-log off
>   9:     option transport.socket.keepalive-interval 2
>  10:     option transport.socket.keepalive-time 10
>  11:     option transport-type socket
>  12:     option working-directory /var/lib/glusterd
>  13: end-volume
>  14: 
>
+------------------------------------------------------------------------------+
> [2015-03-19 13:42:02.258403] I
[glusterd-handler.c:1015:__glusterd_handle_cli_pr
> obe] 0-glusterd: Received CLI probe req 10.32.1.144 24007
> [2015-03-19 13:42:02.259456] I
[glusterd-handler.c:3165:glusterd_probe_begin] 0-
> glusterd: Unable to find peerinfo for host: 10.32.1.144 (24007)
> [2015-03-19 13:42:02.259664] I [rpc-clnt.c:969:rpc_clnt_connection_init]
0-manag
> ement: setting frame-timeout to 600
> [2015-03-19 13:42:02.260488] I
[glusterd-handler.c:3098:glusterd_friend_add] 0-m
> anagement: connect returned 0
> [2015-03-19 13:42:02.270316] I [glusterd.c:176:glusterd_uuid_generate_save]
0-ma
> nagement: generated UUID: 4441e237-89d6-4cdf-a212-f17ecb953b58
> [2015-03-19 13:42:02.273427] I
[glusterd-rpc-ops.c:244:__glusterd_probe_cbk] 0-m
> anagement: Received probe resp from uuid:
82cdb873-28cc-4ed0-8cfe-2b6275770429,
> host: 10.32.1.144
> [2015-03-19 13:42:02.273681] I
[glusterd-rpc-ops.c:386:__glusterd_probe_cbk] 0-g
> lusterd: Received resp to probe req
> [2015-03-19 13:42:02.278863] I
[glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_
> versions_ack] 0-management: using the op-version 30600
> [2015-03-19 13:52:03.277422] E [rpc-clnt.c:201:call_bail] 0-management:
bailing
> out frame type(Peer mgmt) op(--(2)) xid = 0x6 sent = 2015-03-19
13:42:02.273482.
>  timeout = 600 for 10.32.1.144:24007Here is the issue, there was some problem in the network at the time
when peer probe was issued. This is why the call bail is seen. Could you
try to deprobe and then probe it back again?> [2015-03-19 13:52:03.277453] I [socket.c:3366:socket_submit_reply]
0-socket.mana
> gement: not connected (priv->connected = 255)
> [2015-03-19 13:52:03.277468] E [rpcsvc.c:1247:rpcsvc_submit_generic]
0-rpc-servi
> ce: failed to submit message (XID: 0x1, Program: GlusterD svc cli,
ProgVers: 2,
> Proc: 1) to rpc-transport (socket.management)
> [2015-03-19 13:52:03.277483] E [glusterd-utils.c:387:glusterd_submit_reply]
0-:
> Reply submission failed
> 
> 
> 
> Logs from 10.32.1.144:
> ---------------------------------
> # more ./.cmd_log_history
> 
> # more ./etc-glusterfs-glusterd.vol.log
> [1970-01-01 00:00:53.225739] I [MSGID: 100030] [glusterfsd.c:2018:main]
0-/usr/s
> bin/glusterd: Started running /usr/sbin/glusterd version 3.6.2 (args:
/usr/sbin/
> glusterd -p /var/run/glusterd.pid)
> [1970-01-01 00:00:53.229222] I [glusterd.c:1214:init] 0-management: Maximum
allo
> wed open file descriptors set to 65536
> [1970-01-01 00:00:53.229301] I [glusterd.c:1259:init] 0-management: Using
/var/l
> ib/glusterd as working directory
> [1970-01-01 00:00:53.231653] I
[glusterd-store.c:2063:glusterd_restore_op_versio
> n] 0-management: Detected new install. Setting op-version to maximum :
30600
> [1970-01-01 00:00:53.231730] I
[glusterd-store.c:3497:glusterd_store_retrieve_mi
> ssed_snaps_list] 0-management: No missed snaps list.
> Final graph:
>
+------------------------------------------------------------------------------+
>   1: volume management
>   2:     type mgmt/glusterd
>   3:     option rpc-auth.auth-glusterfs on
>   4:     option rpc-auth.auth-unix on
>   5:     option rpc-auth.auth-null on
>   6:     option transport.socket.listen-backlog 128
>   7:     option ping-timeout 30
>   8:     option transport.socket.read-fail-log off
>   9:     option transport.socket.keepalive-interval 2
>  10:     option transport.socket.keepalive-time 10
>  11:     option transport-type socket
>  12:     option working-directory /var/lib/glusterd
>  13: end-volume
>  14: 
>
+------------------------------------------------------------------------------+
> [1970-01-01 00:01:24.417689] I
[glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_
> versions_ack] 0-management: using the op-version 30600
> [1970-01-01 00:01:24.417736] I [glusterd.c:176:glusterd_uuid_generate_save]
0-ma
> nagement: generated UUID: 82cdb873-28cc-4ed0-8cfe-2b6275770429
> [1970-01-01 00:01:24.420067] I
[glusterd-handler.c:2523:__glusterd_handle_probe_
> query] 0-glusterd: Received probe from uuid:
4441e237-89d6-4cdf-a212-f17ecb953b5
> 8
> [1970-01-01 00:01:24.420158] I
[glusterd-handler.c:2551:__glusterd_handle_probe_
> query] 0-glusterd: Unable to find peerinfo for host: 10.32.0.48 (24007)
> [1970-01-01 00:01:24.420379] I [rpc-clnt.c:969:rpc_clnt_connection_init]
0-manag
> ement: setting frame-timeout to 600
> [1970-01-01 00:01:24.421140] I
[glusterd-handler.c:3098:glusterd_friend_add] 0-m
> anagement: connect returned 0
> [1970-01-01 00:01:24.421167] I
[glusterd-handler.c:2575:__glusterd_handle_probe_
> query] 0-glusterd: Responded to 10.32.0.48, op_ret: 0, op_errno: 0, ret: 0
> [1970-01-01 00:01:24.422991] I
[glusterd-handler.c:2216:__glusterd_handle_incomi
> ng_friend_req] 0-glusterd: Received probe from uuid:
4441e237-89d6-4cdf-a212-f17
> ecb953b58
> [1970-01-01 00:01:24.423024] E
[glusterd-utils.c:5760:glusterd_compare_friend_da
> ta] 0-management: Importing global options failed
> [1970-01-01 00:01:24.423036] E [glusterd-sm.c:1078:glusterd_friend_sm]
0-gluster
> d: handler returned: -2
>  
> 
> Regards
> Andreas
> 
> 
> On 03/22/15 07:33, Atin Mukherjee wrote:
>>
>> On 03/22/2015 12:09 AM, Andreas Hollaus wrote:
>>> Hi,
>>>
>>> I get a strange result when I execute 'gluster peer probe'.
The command hangs and
>>> seems to timeout without any message (I can ping the address):
>>> # gluster peer probe 10.32.1.144
>>> # echo $?
>>> 146
>> Could you provide the glusterd log and .cmd_log_history for all the
>> nodes in the cluster?
>>> The status looks promising, but there's a differences between
this output and what
>>> you normally get from a successful call:
>>> # gluster peer status
>>> Number of Peers: 1
>>>
>>> Hostname: 10.32.1.144
>>> Uuid: 0b008d3e-c51b-4243-ad19-c79c869ba9f2
>>> State: Probe Sent to Peer (Connected)
>>>
>>> (instead of 'State: Peer in Cluster (Connected)')
>>>
>>> Running the command again will tell you that it is connected:
>>>
>>> # gluster peer probe 10.32.1.144
>>> peer probe: success. Host 10.32.1.144 port 24007 already in peer
list
>> This means that this peer was added locally but peer handshake was not
>> completed for previous peer probe transaction. I would be interested to
>> see the logs and then can comment on what went wrong.
>>> But when you try to add a brick from that server it fails:
>>>
>>> # gluster volume add-brick c_test replica 2
10.32.1.144:/opt/lvmdir/c2 force
>>> volume add-brick: failed: Host 10.32.1.144 is not in 'Peer in
Cluster' state
>>>
>>> The volume was previously created using the following commands:
>>> # gluster volume create c_test 10.32.0.48:/opt/lvmdir/c2 force
>>> volume create: c_test: success: please start the volume to access
data
>>> # gluster volume start c_test
>>> volume start: c_test: success
>>>
>>> What could be the reason for this problem?
>>>
>>>
>>> Regards
>>> Andreas
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
> 
> 
-- 
~Atin

Gluster users - Mar 2015 - gluster peer probe error (v3.6.2)

[Gluster-users] gluster peer probe error (v3.6.2)

[Gluster-users] gluster peer probe error (v3.6.2)