thr3ads.net - Gluster users - [Gluster-users] Gluster replicate-brick issues (Distrubuted-Replica) [Feb 2015]

If this information is useful, please help other people find it:
Share via:

Thomas Holkenbrink

2015-Feb-14 08:58 UTC

[Gluster-users] Gluster replicate-brick issues (Distrubuted-Replica)

We have tried to migrate a Brick from one server to another using the following
commands.   But the data is NOT being replicated... and the BRICK is not showing
up anymore.
Gluster still appears to be working but the Bricks are not balanced and I need
to add the other Brick for Server3 that I don't want to do until after
Server1:Brick2 gets replicated.

This is the command to create the Original Volume:
[root at Server1 ~]# gluster volume create Storage1 replica 2 transport tcp
Server1:/exp/br01/brick1 Server2:/exp/br01/brick1 Server1:/exp/br02/brick2
Server2:/exp/br02/brick2


This is the Current configuration BEFORE the migration.. Server3 has been Peer
Probed successfully but that has been it
[root at Server1 ~]# gluster --version
glusterfs 3.6.2 built on Jan 22 2015 12:58:11

[root at Server1 ~]# gluster volume status
Status of volume: Storage1
Gluster process                 Port    Online  Pid
------------------------------------------------------------------------------
Brick Server1:/exp/br01/brick1  49152   Y       2167
Brick Server2:/exp/br01/brick1  49152   Y       2192
Brick Server1:/exp/br02/brick2  49153   Y       2172   <--- this is the one
that goes missing
Brick Server2:/exp/br02/brick2  49153   Y       2193
NFS Server on localhost         2049    Y       2181
Self-heal Daemon on localhost   N/A     Y       2186
NFS Server on Server2           2049    Y       2205
Self-heal Daemon on Server2     N/A     Y       2210
NFS Server on Server3           2049    Y       6015
Self-heal Daemon on Server3     N/A     Y       6016

Task Status of Volume Storage1
------------------------------------------------------------------------------
There are no active volume tasks
[root at Server1 ~]# gluster volume info

Volume Name: Storage1
Type: Distributed-Replicate
Volume ID: 9616ce42-48bd-4fe3-883f-decd6c4fcd00
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: Server1:/exp/br01/brick1
Brick2: Server2:/exp/br01/brick1
Brick3: Server1:/exp/br02/brick2
Brick4: Server2:/exp/br02/brick2
Options Reconfigured:
diagnostics.brick-log-level: WARNING
diagnostics.client-log-level: WARNING
cluster.entry-self-heal: off
cluster.data-self-heal: off
cluster.metadata-self-heal: off
performance.cache-size: 1024MB
performance.cache-max-file-size: 2MB
performance.cache-refresh-timeout: 1
performance.stat-prefetch: off
performance.read-ahead: on
performance.quick-read: off
performance.write-behind-window-size: 4MB
performance.flush-behind: on
performance.write-behind: on
performance.io-thread-count: 32
performance.io-cache: on
network.ping-timeout: 2
nfs.addr-namelookup: off
performance.strict-write-ordering: on
[root at Server1 ~]#



So we start the Migration of the Brick to the new server using the replace Brick
command
[root at Server1 ~]# volname=Storage1

[root at Server1 ~]# from=Server1:/exp/br02/brick2

[root at Server1 ~]# to=Server3:/exp/br02/brick2

[root at Server1 ~]# gluster volume replace-brick $volname $from $to start
All replace-brick commands except commit force are deprecated. Do you want to
continue? (y/n) y
volume replace-brick: success: replace-brick started successfully
ID: 0062d555-e7eb-4ebe-a264-7e0baf6e7546


[root at Server1 ~]# gluster volume replace-brick $volname $from $to status
All replace-brick commands except commit force are deprecated. Do you want to
continue? (y/n) y
volume replace-brick: success: Number of files migrated = 281   Migration
complete

At this point everything seems to be in order with no outstanding issues.

[root at Server1 ~]# gluster volume status
Status of volume: Storage1
Gluster process                 Port    Online  Pid
------------------------------------------------------------------------------
Brick Server1:/exp/br01/brick1  49152   Y       2167
Brick Server2:/exp/br01/brick1  49152   Y       2192
Brick Server1:/exp/br02/brick2  49153   Y       27557
Brick Server2:/exp/br02/brick2  49153   Y       2193
NFS Server on localhost         2049    Y       27562
Self-heal Daemon on localhost   N/A     Y       2186
NFS Server on Server2           2049    Y       2205
Self-heal Daemon on Server2     N/A     Y       2210
NFS Server on Server3           2049    Y       6015
Self-heal Daemon on Server3     N/A     Y       6016

Task Status of Volume Storage1
------------------------------------------------------------------------------
Task                 : Replace brick
ID                   : 0062d555-e7eb-4ebe-a264-7e0baf6e7546
Source Brick         : Server1:/exp/br02/brick2
Destination Brick    : Server3:/exp/br02/brick2
Status               : completed

The volume reports that the replace Brick command completed.. so the next step
is to commit the change

[root at Server1 ~]# gluster volume replace-brick $volname $from $to commit
All replace-brick commands except commit force are deprecated. Do you want to
continue? (y/n) y
volume replace-brick: success: replace-brick commit successful

At this point when I take a look at the status I see that the OLD brick is now
missing (Server1:/exp/br02/brick2) AND I don't see the new Brick... WTF...
panic!

[root at Server1 ~]# gluster volume status
Status of volume: Storage1
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick Server1:/exp/br01/brick1  49152   Y       2167
Brick Server2:/exp/br01/brick1  49152   Y       2192
Brick Server2:/exp/br02/brick2  49153   Y       2193
NFS Server on localhost         2049    Y       28906
Self-heal Daemon on localhost   N/A     Y       28911
NFS Server on Server2           2049    Y       2205
Self-heal Daemon on Server2     N/A     Y       2210
NFS Server on Server3           2049    Y       6015
Self-heal Daemon on Server3     N/A     Y       6016

Task Status of Volume Storage1
------------------------------------------------------------------------------
There are no active volume tasks


After the commit on Server1 it does not have the Tasks listed anymore... yet
server2 and server3 see this

[root at Server2 ~]# gluster volume status
Status of volume: Storage1
Gluster process                 Port    Online  Pid
------------------------------------------------------------------------------
Brick Server1:/exp/br01/brick1  49152   Y       2167
Brick Server2:/exp/br01/brick1  49152   Y       2192
Brick Server2:/exp/br02/brick2  49153   Y       2193
NFS Server on localhost         2049    Y       2205
Self-heal Daemon on localhost   N/A     Y       2210
NFS Server on 10.45.16.17       2049    Y       28906
Self-heal Daemon on 10.45.16.17 N/A     Y       28911
NFS Server on server3           2049    Y       6015
Self-heal Daemon on server3     N/A     Y       6016

Task Status of Volume Storage1
------------------------------------------------------------------------------
Task                 : Replace brick
ID                   : 0062d555-e7eb-4ebe-a264-7e0baf6e7546
Source Brick         : Server1:/exp/br02/brick2
Destination Brick    : server3:/exp/br02/brick2
Status               : completed


If I navigate the brick on Server3 the brick is NOT empty.. but missing A LOT! 
It's like the replace brick stopped... and never restarted again.
The replace brick reported back "Number of files migrated = 281   Migration
complete" but when I look on Server3 Brick I get:
       [root at Server3 brick2]# find . -type f -print | wc -l
16

I'm missing 265 files.. (they still exist on the OLD brick.. but how can I
move it?)

If I try to add the old brick back with another brick on the new server as such
[root at Server1 ~]# gluster volume add-brick Storage1 Server1:/exp/br02/brick2
Server3:/exp/br01/brick1
volume add-brick: failed: /exp/br02/brick2 is already part of a volume

Im fearfull of running:
[root at Server1 ~]# setfattr -n trusted.glusterfs.volume-id -v 0x$(grep
volume-id /var/lib/glusterd/vols/$volname/info | cut -d= -f2 | sed
's/-//g') /exp/br02/brick2
Although it should allow me to add the brick

Gluster Heal info returns:
[root at Server2 ~]# gluster volume heal Storage1 info
Brick Server1:/exp/br01/brick1/
Number of entries: 0

Brick Server2:/exp/br01/brick1/
Number of entries: 0

Brick Server1:/exp/br02/brick2
Status: Transport endpoint is not connected

Brick Server2:/exp/br02/brick2/
Number of entries: 0

I have restarted glusterd numerous times.


at this time I'm not sure where to go from here... I know that the
Server1:/exp/br02/brick2 still has all the data.. and Server3:/exp/br01/brick1
is not complete

How do I actually get the brick to replicate?
How can I add Server1:/exp/br02/brick2 back into the trusted pool if I can't
replicate it, or re-add it?
How can I fix this to get it back into a replicated state between the three
servers?

Thomas




----DATA----

Gluster volume info at this point
[root at Server1 ~]# gluster volume info

Volume Name: Storage1
Type: Distributed-Replicate
Volume ID: 9616ce42-48bd-4fe3-883f-decd6c4fcd00
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: Server1:/exp/br01/brick1
Brick2: Server2:/exp/br01/brick1
Brick3: server3:/exp/br02/brick2
Brick4: Server2:/exp/br02/brick2
Options Reconfigured:
diagnostics.brick-log-level: WARNING
diagnostics.client-log-level: WARNING
cluster.entry-self-heal: off
cluster.data-self-heal: off
cluster.metadata-self-heal: off
performance.cache-size: 1024MB
performance.cache-max-file-size: 2MB
performance.cache-refresh-timeout: 1
performance.stat-prefetch: off
performance.read-ahead: on
performance.quick-read: off
performance.write-behind-window-size: 4MB
performance.flush-behind: on
performance.write-behind: on
performance.io-thread-count: 32
performance.io-cache: on
network.ping-timeout: 2
nfs.addr-namelookup: off
performance.strict-write-ordering: on
[root at Server1 ~]#

[root at server3 brick2]# gluster volume heal Storage1 info
Brick Server1:/exp/br01/brick1/
Number of entries: 0

Brick Server2:/exp/br01/brick1/
Number of entries: 0

Brick Server3:/exp/br02/brick2/
Number of entries: 0

Brick Server2:/exp/br02/brick2/
Number of entries: 0


Gluster LOG ( there are a few errors but I'm not sure how to decipher them)

[2015-02-14 06:29:19.862809] I [MSGID: 106005]
[glusterd-handler.c:4142:__glusterd_brick_rpc_notify] 0-management: Brick
Server1:/exp/br02/brick2 has disconnected from glusterd.
[2015-02-14 06:29:19.862836] W [socket.c:611:__socket_rwv] 0-management: readv
on /var/run/7565ec897c6454bd3e2f4800250a7221.socket failed (Invalid argument)
[2015-02-14 06:29:19.862853] I [MSGID: 106006]
[glusterd-handler.c:4257:__glusterd_nodesvc_rpc_notify] 0-management: nfs has
disconnected from glusterd.
[2015-02-14 06:29:19.953762] I [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap:
adding brick /exp/br02/brick2 on port 49153
[2015-02-14 06:31:12.977450] I
[glusterd-replace-brick.c:99:__glusterd_handle_replace_brick] 0-management:
Received replace brick req
[2015-02-14 06:31:12.977495] I
[glusterd-replace-brick.c:154:__glusterd_handle_replace_brick] 0-management:
Received replace brick status request
[2015-02-14 06:31:13.048852] I
[glusterd-replace-brick.c:1412:rb_update_srcbrick_port] 0-: adding src-brick
port no
[2015-02-14 06:31:19.588380] I
[glusterd-replace-brick.c:99:__glusterd_handle_replace_brick] 0-management:
Received replace brick req
[2015-02-14 06:31:19.588422] I
[glusterd-replace-brick.c:154:__glusterd_handle_replace_brick] 0-management:
Received replace brick status request
[2015-02-14 06:31:19.661101] I
[glusterd-replace-brick.c:1412:rb_update_srcbrick_port] 0-: adding src-brick
port no
[2015-02-14 06:31:45.115355] W [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx]
0-management: op_ctx modification failed
[2015-02-14 06:31:45.118597] I
[glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management: Received
status volume req for volume Storage1
[2015-02-14 06:32:10.956357] I
[glusterd-replace-brick.c:99:__glusterd_handle_replace_brick] 0-management:
Received replace brick req
[2015-02-14 06:32:10.956385] I
[glusterd-replace-brick.c:154:__glusterd_handle_replace_brick] 0-management:
Received replace brick commit request
[2015-02-14 06:32:11.028472] I
[glusterd-replace-brick.c:1412:rb_update_srcbrick_port] 0-: adding src-brick
port no
[2015-02-14 06:32:12.122552] I
[glusterd-utils.c:6276:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV3
successfully
[2015-02-14 06:32:12.131836] I
[glusterd-utils.c:6281:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV1
successfully
[2015-02-14 06:32:12.141107] I
[glusterd-utils.c:6286:glusterd_nfs_pmap_deregister] 0-: De-registered NFSV3
successfully
[2015-02-14 06:32:12.150375] I
[glusterd-utils.c:6291:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v4
successfully
[2015-02-14 06:32:12.159630] I
[glusterd-utils.c:6296:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v1
successfully
[2015-02-14 06:32:12.168889] I
[glusterd-utils.c:6301:glusterd_nfs_pmap_deregister] 0-: De-registered ACL v3
successfully
[2015-02-14 06:32:13.254689] I [rpc-clnt.c:969:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
[2015-02-14 06:32:13.254799] W [socket.c:2992:socket_connect] 0-management:
Ignore failed connection attempt on , (No such file or directory)
[2015-02-14 06:32:13.257790] I [rpc-clnt.c:969:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
[2015-02-14 06:32:13.257908] W [socket.c:2992:socket_connect] 0-management:
Ignore failed connection attempt on , (No such file or directory)
[2015-02-14 06:32:13.258031] W [socket.c:611:__socket_rwv] 0-socket.management:
writev on 127.0.0.1:1019 failed (Broken pipe)
[2015-02-14 06:32:13.258111] W [socket.c:611:__socket_rwv] 0-socket.management:
writev on 127.0.0.1:1021 failed (Broken pipe)
[2015-02-14 06:32:13.258130] W [socket.c:611:__socket_rwv] 0-socket.management:
writev on 10.45.16.17:1018 failed (Broken pipe)
[2015-02-14 06:32:13.711948] I [mem-pool.c:545:mem_pool_destroy] 0-management:
size=588 max=0 total=0
[2015-02-14 06:32:13.711967] I [mem-pool.c:545:mem_pool_destroy] 0-management:
size=124 max=0 total=0
[2015-02-14 06:32:13.712008] I [mem-pool.c:545:mem_pool_destroy] 0-management:
size=588 max=0 total=0
[2015-02-14 06:32:13.712021] I [mem-pool.c:545:mem_pool_destroy] 0-management:
size=124 max=0 total=0
[2015-02-14 06:32:13.731311] I [mem-pool.c:545:mem_pool_destroy] 0-management:
size=588 max=0 total=0
[2015-02-14 06:32:13.731326] I [mem-pool.c:545:mem_pool_destroy] 0-management:
size=124 max=0 total=0
[2015-02-14 06:32:13.731356] I [glusterd-pmap.c:271:pmap_registry_remove]
0-pmap: removing brick /exp/br02/brick2 on port 49153
[2015-02-14 06:32:13.823129] I [socket.c:2344:socket_event_handler] 0-transport:
disconnecting now
[2015-02-14 06:32:13.840668] W [socket.c:611:__socket_rwv] 0-management: readv
on /var/run/7565ec897c6454bd3e2f4800250a7221.socket failed (Invalid argument)
[2015-02-14 06:32:13.840693] I [MSGID: 106006]
[glusterd-handler.c:4257:__glusterd_nodesvc_rpc_notify] 0-management: nfs has
disconnected from glusterd.
[2015-02-14 06:32:13.840712] W [socket.c:611:__socket_rwv] 0-management: readv
on /var/run/ac4c043d3c6a2e5159c86e8c75c51829.socket failed (Invalid argument)
[2015-02-14 06:32:13.840728] I [MSGID: 106006]
[glusterd-handler.c:4257:__glusterd_nodesvc_rpc_notify] 0-management: glustershd
has disconnected from glusterd.
[2015-02-14 06:32:14.729667] E
[glusterd-rpc-ops.c:1169:__glusterd_commit_op_cbk] 0-management: Received commit
RJT from uuid: 294aa603-ec24-44b9-864b-0fe743faa8d9
[2015-02-14 06:32:14.743623] E
[glusterd-rpc-ops.c:1169:__glusterd_commit_op_cbk] 0-management: Received commit
RJT from uuid: 92aabaf4-4b6c-48da-82b6-c465aff2ec6d
[2015-02-14 06:32:18.762975] W [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx]
0-management: op_ctx modification failed
[2015-02-14 06:32:18.764552] I
[glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management: Received
status volume req for volume Storage1
[2015-02-14 06:32:18.769051] E
[glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status]
0-management: Local tasks count (0) and remote tasks count (1) do not match. Not
aggregating tasks status.
[2015-02-14 06:32:18.769070] E [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk]
0-management: Failed to aggregate response from  node/brick
[2015-02-14 06:32:18.771095] E
[glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status]
0-management: Local tasks count (0) and remote tasks count (1) do not match. Not
aggregating tasks status.
[2015-02-14 06:32:18.771108] E [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk]
0-management: Failed to aggregate response from  node/brick
[2015-02-14 06:32:48.570796] W [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx]
0-management: op_ctx modification failed
[2015-02-14 06:32:48.572352] I
[glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management: Received
status volume req for volume Storage1
[2015-02-14 06:32:48.576899] E
[glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status]
0-management: Local tasks count (0) and remote tasks count (1) do not match. Not
aggregating tasks status.
[2015-02-14 06:32:48.576918] E [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk]
0-management: Failed to aggregate response from  node/brick
[2015-02-14 06:32:48.578982] E
[glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status]
0-management: Local tasks count (0) and remote tasks count (1) do not match. Not
aggregating tasks status.
[2015-02-14 06:32:48.579001] E [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk]
0-management: Failed to aggregate response from  node/brick
[2015-02-14 06:36:57.840738] W [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx]
0-management: op_ctx modification failed
[2015-02-14 06:36:57.842370] I
[glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management: Received
status volume req for volume Storage1
[2015-02-14 06:36:57.846919] E
[glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status]
0-management: Local tasks count (0) and remote tasks count (1) do not match. Not
aggregating tasks status.
[2015-02-14 06:36:57.846941] E [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk]
0-management: Failed to aggregate response from  node/brick
[2015-02-14 06:36:57.849026] E
[glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status]
0-management: Local tasks count (0) and remote tasks count (1) do not match. Not
aggregating tasks status.
[2015-02-14 06:36:57.849046] E [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk]
0-management: Failed to aggregate response from  node/brick
[2015-02-14 06:37:20.208081] W [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx]
0-management: op_ctx modification failed
[2015-02-14 06:37:20.211279] I
[glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management: Received
status volume req for volume Storage1
[2015-02-14 06:37:20.215792] E
[glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status]
0-management: Local tasks count (0) and remote tasks count (1) do not match. Not
aggregating tasks status.
[2015-02-14 06:37:20.215809] E [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk]
0-management: Failed to aggregate response from  node/brick
[2015-02-14 06:37:20.216295] E
[glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status]
0-management: Local tasks count (0) and remote tasks count (1) do not match. Not
aggregating tasks status.
[2015-02-14 06:37:20.216308] E [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk]
0-management: Failed to aggregate response from  node/brick
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150214/0a46fa1e/attachment.html>

Subrata Ghosh

2015-Feb-15 05:41 UTC

head link

[Gluster-users] Gluster replicate-brick issues (Distrubuted-Replica)

Please try to use "commit force" option and see whether it works as 
gluster  suggested the other only replace-brick command has been 
depreciated.

[root at Server1 ~]# gluster volume replace-brick $volname $from $to start

All replace-brick commands except commit force are deprecated. Do you 
want to continue? (y/n) y

volume replace-brick: success: replace-brick started successfully

ID: 0062d555-e7eb-4ebe-a264-7e0baf6e7546

You could try following  options , used to resolve our replace-brcik 
scenario  looks like close to your case  "gluster heal info" has some 
issues. That I requested for a clarification fewdays back.


# gluster volume replace-bricks $vol_name $old_brick $new_brick commit force

# gluster volume heal $vol_name full

#gluster volume $vol_name info      --> this shows that the Numer of 
files=1, even though the file is already healed.


Regards,
Subrata

On 02/14/2015 02:28 PM, Thomas Holkenbrink wrote:>
> We have tried to migrate a Brick from one server to another using the 
> following commands.   But the data is NOT being replicated... and the 
> BRICK is not showing up anymore.
>
> Gluster still appears to be working but the Bricks are not balanced 
> and I need to add the other Brick for Server3 that I don't want to do 
> until after Server1:Brick2 gets replicated.
>
> This is the command to create the Original Volume:
>
> [root at Server1 ~]# gluster volume create Storage1 replica 2 transport 
> tcp Server1:/exp/br01/brick1 Server2:/exp/br01/brick1 
> Server1:/exp/br02/brick2 Server2:/exp/br02/brick2
>
> This is the Current configuration BEFORE the migration.. Server3 has 
> been Peer Probed successfully but that has been it
>
> [root at Server1 ~]# gluster --version
>
> glusterfs 3.6.2 built on Jan 22 2015 12:58:11
>
> [root at Server1 ~]# gluster volume status
>
> Status of volume: Storage1
>
> Gluster process                 Port    Online  Pid
>
>
------------------------------------------------------------------------------
>
> Brick Server1:/exp/br01/brick1  49152   Y       2167
>
> Brick Server2:/exp/br01/brick1  49152   Y       2192
>
> *Brick Server1:/exp/br02/brick2 49153   Y       2172   <--- this is 
> the one that goes missing*
>
> Brick Server2:/exp/br02/brick2  49153   Y       2193
>
> NFS Server on localhost         2049    Y       2181
>
> Self-heal Daemon on localhost   N/A     Y       2186
>
> NFS Server on Server2           2049    Y       2205
>
> Self-heal Daemon on Server2     N/A     Y       2210
>
> NFS Server on Server3           2049    Y       6015
>
> Self-heal Daemon on Server3     N/A     Y       6016
>
> Task Status of Volume Storage1
>
>
------------------------------------------------------------------------------
>
> There are no active volume tasks
>
> [root at Server1 ~]# gluster volume info
>
> Volume Name: Storage1
>
> Type: Distributed-Replicate
>
> Volume ID: 9616ce42-48bd-4fe3-883f-decd6c4fcd00
>
> Status: Started
>
> Number of Bricks: 2 x 2 = 4
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: Server1:/exp/br01/brick1
>
> Brick2: Server2:/exp/br01/brick1
>
> Brick3: Server1:/exp/br02/brick2
>
> Brick4: Server2:/exp/br02/brick2
>
> Options Reconfigured:
>
> diagnostics.brick-log-level: WARNING
>
> diagnostics.client-log-level: WARNING
>
> cluster.entry-self-heal: off
>
> cluster.data-self-heal: off
>
> cluster.metadata-self-heal: off
>
> performance.cache-size: 1024MB
>
> performance.cache-max-file-size: 2MB
>
> performance.cache-refresh-timeout: 1
>
> performance.stat-prefetch: off
>
> performance.read-ahead: on
>
> performance.quick-read: off
>
> performance.write-behind-window-size: 4MB
>
> performance.flush-behind: on
>
> performance.write-behind: on
>
> performance.io-thread-count: 32
>
> performance.io-cache: on
>
> network.ping-timeout: 2
>
> nfs.addr-namelookup: off
>
> performance.strict-write-ordering: on
>
> [root at Server1 ~]#
>
> So we start the Migration of the Brick to the new server using the 
> replace Brick command
>
> [root at Server1 ~]# volname=Storage1
>
> [root at Server1 ~]# from=Server1:/exp/br02/brick2
>
> [root at Server1 ~]# to=Server3:/exp/br02/brick2
>
> [root at Server1 ~]# gluster volume replace-brick $volname $from $to start
>
> All replace-brick commands except commit force are deprecated. Do you 
> want to continue? (y/n) y
>
> volume replace-brick: success: replace-brick started successfully
>
> ID: 0062d555-e7eb-4ebe-a264-7e0baf6e7546
>
> [root at Server1 ~]# gluster volume replace-brick $volname $from $to status
>
> All replace-brick commands except commit force are deprecated. Do you 
> want to continue? (y/n) y
>
> volume replace-brick: success: Number of files migrated = 281 
> Migration complete
>
> At this point everything seems to be in order with no outstanding issues.
>
> [root at Server1 ~]# gluster volume status
>
> Status of volume: Storage1
>
> Gluster process                 Port    Online  Pid
>
>
------------------------------------------------------------------------------
>
> Brick Server1:/exp/br01/brick1  49152   Y       2167
>
> Brick Server2:/exp/br01/brick1  49152   Y       2192
>
> Brick Server1:/exp/br02/brick2  49153   Y       27557
>
> Brick Server2:/exp/br02/brick2  49153   Y       2193
>
> NFS Server on localhost         2049    Y       27562
>
> Self-heal Daemon on localhost   N/A     Y       2186
>
> NFS Server on Server2           2049    Y       2205
>
> Self-heal Daemon on Server2     N/A     Y       2210
>
> NFS Server on Server3           2049    Y       6015
>
> Self-heal Daemon on Server3     N/A     Y       6016
>
> Task Status of Volume Storage1
>
>
------------------------------------------------------------------------------
>
> Task : Replace brick
>
> ID : 0062d555-e7eb-4ebe-a264-7e0baf6e7546
>
> Source Brick         : Server1:/exp/br02/brick2
>
> Destination Brick    : Server3:/exp/br02/brick2
>
> Status : completed
>
> The volume reports that the replace Brick command completed.. so the 
> next step is to commit the change
>
> [root at Server1 ~]# gluster volume replace-brick $volname $from $to commit
>
> All replace-brick commands except commit force are deprecated. Do you 
> want to continue? (y/n) y
>
> volume replace-brick: success: replace-brick commit successful
>
> At this point when I take a look at the status I see that the OLD 
> brick is now missing (Server1:/exp/br02/brick2) AND I don't see the 
> new Brick... WTF... panic!
>
> [root at Server1 ~]# gluster volume status
>
> Status of volume: Storage1
>
> Gluster process                                         Port Online  Pid
>
>
------------------------------------------------------------------------------
>
> Brick Server1:/exp/br01/brick1  49152   Y       2167
>
> Brick Server2:/exp/br01/brick1  49152   Y       2192
>
> Brick Server2:/exp/br02/brick2  49153   Y       2193
>
> NFS Server on localhost         2049    Y       28906
>
> Self-heal Daemon on localhost   N/A     Y       28911
>
> NFS Server on Server2           2049    Y       2205
>
> Self-heal Daemon on Server2     N/A     Y       2210
>
> NFS Server on Server3           2049    Y       6015
>
> Self-heal Daemon on Server3     N/A     Y       6016
>
> Task Status of Volume Storage1
>
>
------------------------------------------------------------------------------
>
> There are no active volume tasks
>
> After the commit on Server1 it does not have the Tasks listed 
> anymore... yet server2 and server3 see this
>
> [root at Server2 ~]# gluster volume status
>
> Status of volume: Storage1
>
> Gluster process Port    Online  Pid
>
>
------------------------------------------------------------------------------
>
> Brick Server1:/exp/br01/brick1 49152   Y       2167
>
> Brick Server2:/exp/br01/brick1 49152   Y       2192
>
> Brick Server2:/exp/br02/brick2 49153   Y       2193
>
> NFS Server on localhost 2049    Y       2205
>
> Self-heal Daemon on localhost N/A     Y       2210
>
> NFS Server on 10.45.16.17 2049    Y       28906
>
> Self-heal Daemon on 10.45.16.17 N/A     Y       28911
>
> NFS Server on server3 2049    Y       6015
>
> Self-heal Daemon on server3 N/A     Y       6016
>
> Task Status of Volume Storage1
>
>
------------------------------------------------------------------------------
>
> Task                 : Replace brick
>
> ID                   : 0062d555-e7eb-4ebe-a264-7e0baf6e7546
>
> Source Brick         : Server1:/exp/br02/brick2
>
> Destination Brick    : server3:/exp/br02/brick2
>
> Status               : completed
>
> If I navigate the brick on Server3 the brick is NOT empty.. but 
> missing A LOT!  It's like the replace brick stopped... and never 
> restarted again.
>
> The replace brick reported back "Number of files migrated = 281   
> Migration complete" but when I look on Server3 Brick I get:
>
>        [root at Server3 brick2]# find . -type f -print | wc -l
>
> 16
>
> I'm missing 265 files.. (they still exist on the OLD brick.. but how 
> can I move it?)
>
> If I try to add the old brick back with another brick on the new 
> server as such
>
> [root at Server1 ~]# gluster volume add-brick Storage1 
> Server1:/exp/br02/brick2 Server3:/exp/br01/brick1
>
> volume add-brick: failed: /exp/br02/brick2 is already part of a volume
>
> Im fearfull of running:
>
> [root at Server1 ~]# setfattr -n trusted.glusterfs.volume-id -v 0x$(grep 
> volume-id /var/lib/glusterd/vols/$volname/info | cut -d= -f2 | sed 
> 's/-//g') /exp/br02/brick2
>
> Although it should allow me to add the brick
>
> Gluster Heal info returns:
>
> [root at Server2 ~]# gluster volume heal Storage1 info
>
> Brick Server1:/exp/br01/brick1/
>
> Number of entries: 0
>
> Brick Server2:/exp/br01/brick1/
>
> Number of entries: 0
>
> Brick Server1:/exp/br02/brick2
>
> Status: Transport endpoint is not connected
>
> Brick Server2:/exp/br02/brick2/
>
> Number of entries: 0
>
> I have restarted glusterd numerous times.
>
> at this time I'm not sure where to go from here... I know that the 
> Server1:/exp/br02/brick2 still has all the data.. and 
> Server3:/exp/br01/brick1 is not complete
>
> How do I actually get the brick to replicate?
>
> How can I add Server1:/exp/br02/brick2 back into the trusted pool if I 
> can't replicate it, or re-add it?
>
> How can I fix this to get it back into a replicated state between the 
> three servers?
>
> Thomas
>
> ----DATA----
>
> Gluster volume info at this point
>
> [root at Server1 ~]# gluster volume info
>
> Volume Name: Storage1
>
> Type: Distributed-Replicate
>
> Volume ID: 9616ce42-48bd-4fe3-883f-decd6c4fcd00
>
> Status: Started
>
> Number of Bricks: 2 x 2 = 4
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: Server1:/exp/br01/brick1
>
> Brick2: Server2:/exp/br01/brick1
>
> Brick3: server3:/exp/br02/brick2
>
> Brick4: Server2:/exp/br02/brick2
>
> Options Reconfigured:
>
> diagnostics.brick-log-level: WARNING
>
> diagnostics.client-log-level: WARNING
>
> cluster.entry-self-heal: off
>
> cluster.data-self-heal: off
>
> cluster.metadata-self-heal: off
>
> performance.cache-size: 1024MB
>
> performance.cache-max-file-size: 2MB
>
> performance.cache-refresh-timeout: 1
>
> performance.stat-prefetch: off
>
> performance.read-ahead: on
>
> performance.quick-read: off
>
> performance.write-behind-window-size: 4MB
>
> performance.flush-behind: on
>
> performance.write-behind: on
>
> performance.io-thread-count: 32
>
> performance.io-cache: on
>
> network.ping-timeout: 2
>
> nfs.addr-namelookup: off
>
> performance.strict-write-ordering: on
>
> [root at Server1 ~]#
>
> [root at server3 brick2]# gluster volume heal Storage1 info
>
> Brick Server1:/exp/br01/brick1/
>
> Number of entries: 0
>
> Brick Server2:/exp/br01/brick1/
>
> Number of entries: 0
>
> Brick Server3:/exp/br02/brick2/
>
> Number of entries: 0
>
> Brick Server2:/exp/br02/brick2/
>
> Number of entries: 0
>
> Gluster LOG ( there are a few errors but I'm not sure how to decipher 
> them)
>
> [2015-02-14 06:29:19.862809] I [MSGID: 106005] 
> [glusterd-handler.c:4142:__glusterd_brick_rpc_notify] 0-management: 
> Brick Server1:/exp/br02/brick2 has disconnected from glusterd.
>
> [2015-02-14 06:29:19.862836] W [socket.c:611:__socket_rwv] 
> 0-management: readv on 
> /var/run/7565ec897c6454bd3e2f4800250a7221.socket failed (Invalid argument)
>
> [2015-02-14 06:29:19.862853] I [MSGID: 106006] 
> [glusterd-handler.c:4257:__glusterd_nodesvc_rpc_notify] 0-management: 
> nfs has disconnected from glusterd.
>
> [2015-02-14 06:29:19.953762] I 
> [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick 
> /exp/br02/brick2 on port 49153
>
> [2015-02-14 06:31:12.977450] I 
> [glusterd-replace-brick.c:99:__glusterd_handle_replace_brick] 
> 0-management: Received replace brick req
>
> [2015-02-14 06:31:12.977495] I 
> [glusterd-replace-brick.c:154:__glusterd_handle_replace_brick] 
> 0-management: Received replace brick status request
>
> [2015-02-14 06:31:13.048852] I 
> [glusterd-replace-brick.c:1412:rb_update_srcbrick_port] 0-: adding 
> src-brick port no
>
> [2015-02-14 06:31:19.588380] I 
> [glusterd-replace-brick.c:99:__glusterd_handle_replace_brick] 
> 0-management: Received replace brick req
>
> [2015-02-14 06:31:19.588422] I 
> [glusterd-replace-brick.c:154:__glusterd_handle_replace_brick] 
> 0-management: Received replace brick status request
>
> [2015-02-14 06:31:19.661101] I 
> [glusterd-replace-brick.c:1412:rb_update_srcbrick_port] 0-: adding 
> src-brick port no
>
> [2015-02-14 06:31:45.115355] W 
> [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx 
> modification failed
>
> [2015-02-14 06:31:45.118597] I 
> [glusterd-handler.c:3803:__glusterd_handle_status_volume] 
> 0-management: Received status volume req for volume Storage1
>
> [2015-02-14 06:32:10.956357] I 
> [glusterd-replace-brick.c:99:__glusterd_handle_replace_brick] 
> 0-management: Received replace brick req
>
> [2015-02-14 06:32:10.956385] I 
> [glusterd-replace-brick.c:154:__glusterd_handle_replace_brick] 
> 0-management: Received replace brick commit request
>
> [2015-02-14 06:32:11.028472] I 
> [glusterd-replace-brick.c:1412:rb_update_srcbrick_port] 0-: adding 
> src-brick port no
>
> [2015-02-14 06:32:12.122552] I 
> [glusterd-utils.c:6276:glusterd_nfs_pmap_deregister] 0-: De-registered 
> MOUNTV3 successfully
>
> [2015-02-14 06:32:12.131836] I 
> [glusterd-utils.c:6281:glusterd_nfs_pmap_deregister] 0-: De-registered 
> MOUNTV1 successfully
>
> [2015-02-14 06:32:12.141107] I 
> [glusterd-utils.c:6286:glusterd_nfs_pmap_deregister] 0-: De-registered 
> NFSV3 successfully
>
> [2015-02-14 06:32:12.150375] I 
> [glusterd-utils.c:6291:glusterd_nfs_pmap_deregister] 0-: De-registered 
> NLM v4 successfully
>
> [2015-02-14 06:32:12.159630] I 
> [glusterd-utils.c:6296:glusterd_nfs_pmap_deregister] 0-: De-registered 
> NLM v1 successfully
>
> [2015-02-14 06:32:12.168889] I 
> [glusterd-utils.c:6301:glusterd_nfs_pmap_deregister] 0-: De-registered 
> ACL v3 successfully
>
> [2015-02-14 06:32:13.254689] I 
> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting 
> frame-timeout to 600
>
> [2015-02-14 06:32:13.254799] W [socket.c:2992:socket_connect] 
> 0-management: Ignore failed connection attempt on , (No such file or 
> directory)
>
> [2015-02-14 06:32:13.257790] I 
> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting 
> frame-timeout to 600
>
> [2015-02-14 06:32:13.257908] W [socket.c:2992:socket_connect] 
> 0-management: Ignore failed connection attempt on , (No such file or 
> directory)
>
> [2015-02-14 06:32:13.258031] W [socket.c:611:__socket_rwv] 
> 0-socket.management: writev on 127.0.0.1:1019 failed (Broken pipe)
>
> [2015-02-14 06:32:13.258111] W [socket.c:611:__socket_rwv] 
> 0-socket.management: writev on 127.0.0.1:1021 failed (Broken pipe)
>
> [2015-02-14 06:32:13.258130] W [socket.c:611:__socket_rwv] 
> 0-socket.management: writev on 10.45.16.17:1018 failed (Broken pipe)
>
> [2015-02-14 06:32:13.711948] I [mem-pool.c:545:mem_pool_destroy] 
> 0-management: size=588 max=0 total=0
>
> [2015-02-14 06:32:13.711967] I [mem-pool.c:545:mem_pool_destroy] 
> 0-management: size=124 max=0 total=0
>
> [2015-02-14 06:32:13.712008] I [mem-pool.c:545:mem_pool_destroy] 
> 0-management: size=588 max=0 total=0
>
> [2015-02-14 06:32:13.712021] I [mem-pool.c:545:mem_pool_destroy] 
> 0-management: size=124 max=0 total=0
>
> [2015-02-14 06:32:13.731311] I [mem-pool.c:545:mem_pool_destroy] 
> 0-management: size=588 max=0 total=0
>
> [2015-02-14 06:32:13.731326] I [mem-pool.c:545:mem_pool_destroy] 
> 0-management: size=124 max=0 total=0
>
> [2015-02-14 06:32:13.731356] I 
> [glusterd-pmap.c:271:pmap_registry_remove] 0-pmap: removing brick 
> /exp/br02/brick2 on port 49153
>
> [2015-02-14 06:32:13.823129] I [socket.c:2344:socket_event_handler] 
> 0-transport: disconnecting now
>
> [2015-02-14 06:32:13.840668] W [socket.c:611:__socket_rwv] 
> 0-management: readv on 
> /var/run/7565ec897c6454bd3e2f4800250a7221.socket failed (Invalid argument)
>
> [2015-02-14 06:32:13.840693] I [MSGID: 106006] 
> [glusterd-handler.c:4257:__glusterd_nodesvc_rpc_notify] 0-management: 
> nfs has disconnected from glusterd.
>
> [2015-02-14 06:32:13.840712] W [socket.c:611:__socket_rwv] 
> 0-management: readv on 
> /var/run/ac4c043d3c6a2e5159c86e8c75c51829.socket failed (Invalid argument)
>
> [2015-02-14 06:32:13.840728] I [MSGID: 106006] 
> [glusterd-handler.c:4257:__glusterd_nodesvc_rpc_notify] 0-management: 
> glustershd has disconnected from glusterd.
>
> [2015-02-14 06:32:14.729667] E 
> [glusterd-rpc-ops.c:1169:__glusterd_commit_op_cbk] 0-management: 
> Received commit RJT from uuid: 294aa603-ec24-44b9-864b-0fe743faa8d9
>
> [2015-02-14 06:32:14.743623] E 
> [glusterd-rpc-ops.c:1169:__glusterd_commit_op_cbk] 0-management: 
> Received commit RJT from uuid: 92aabaf4-4b6c-48da-82b6-c465aff2ec6d
>
> [2015-02-14 06:32:18.762975] W 
> [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx 
> modification failed
>
> [2015-02-14 06:32:18.764552] I 
> [glusterd-handler.c:3803:__glusterd_handle_status_volume] 
> 0-management: Received status volume req for volume Storage1
>
> [2015-02-14 06:32:18.769051] E 
> [glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] 
> 0-management: Local tasks count (0) and remote tasks count (1) do not 
> match. Not aggregating tasks status.
>
> [2015-02-14 06:32:18.769070] E 
> [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed 
> to aggregate response from  node/brick
>
> [2015-02-14 06:32:18.771095] E 
> [glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] 
> 0-management: Local tasks count (0) and remote tasks count (1) do not 
> match. Not aggregating tasks status.
>
> [2015-02-14 06:32:18.771108] E 
> [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed 
> to aggregate response from  node/brick
>
> [2015-02-14 06:32:48.570796] W 
> [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx 
> modification failed
>
> [2015-02-14 06:32:48.572352] I 
> [glusterd-handler.c:3803:__glusterd_handle_status_volume] 
> 0-management: Received status volume req for volume Storage1
>
> [2015-02-14 06:32:48.576899] E 
> [glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] 
> 0-management: Local tasks count (0) and remote tasks count (1) do not 
> match. Not aggregating tasks status.
>
> [2015-02-14 06:32:48.576918] E 
> [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed 
> to aggregate response from  node/brick
>
> [2015-02-14 06:32:48.578982] E 
> [glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] 
> 0-management: Local tasks count (0) and remote tasks count (1) do not 
> match. Not aggregating tasks status.
>
> [2015-02-14 06:32:48.579001] E 
> [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed 
> to aggregate response from  node/brick
>
> [2015-02-14 06:36:57.840738] W 
> [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx 
> modification failed
>
> [2015-02-14 06:36:57.842370] I 
> [glusterd-handler.c:3803:__glusterd_handle_status_volume] 
> 0-management: Received status volume req for volume Storage1
>
> [2015-02-14 06:36:57.846919] E 
> [glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] 
> 0-management: Local tasks count (0) and remote tasks count (1) do not 
> match. Not aggregating tasks status.
>
> [2015-02-14 06:36:57.846941] E 
> [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed 
> to aggregate response from  node/brick
>
> [2015-02-14 06:36:57.849026] E 
> [glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] 
> 0-management: Local tasks count (0) and remote tasks count (1) do not 
> match. Not aggregating tasks status.
>
> [2015-02-14 06:36:57.849046] E 
> [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed 
> to aggregate response from  node/brick
>
> [2015-02-14 06:37:20.208081] W 
> [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx 
> modification failed
>
> [2015-02-14 06:37:20.211279] I 
> [glusterd-handler.c:3803:__glusterd_handle_status_volume] 
> 0-management: Received status volume req for volume Storage1
>
> [2015-02-14 06:37:20.215792] E 
> [glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] 
> 0-management: Local tasks count (0) and remote tasks count (1) do not 
> match. Not aggregating tasks status.
>
> [2015-02-14 06:37:20.215809] E 
> [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed 
> to aggregate response from  node/brick
>
> [2015-02-14 06:37:20.216295] E 
> [glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] 
> 0-management: Local tasks count (0) and remote tasks count (1) do not 
> match. Not aggregating tasks status.
>
> [2015-02-14 06:37:20.216308] E 
> [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed 
> to aggregate response from  node/brick
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150215/a1da4060/attachment.html>

Joe Julian

2015-Feb-15 18:54 UTC

head link

[Gluster-users] Gluster replicate-brick issues (Distrubuted-Replica)

Of those missing files, are they maybe dht link files? Mode 1000, size 0.

On February 14, 2015 12:58:12 AM PST, Thomas Holkenbrink <thomas.holkenbrink
at fibercloud.com> wrote:>We have tried to migrate a Brick from one server to another using the
>following commands.   But the data is NOT being replicated... and the
>BRICK is not showing up anymore.
>Gluster still appears to be working but the Bricks are not balanced and
>I need to add the other Brick for Server3 that I don't want to do until
>after Server1:Brick2 gets replicated.
>
>This is the command to create the Original Volume:
>[root at Server1 ~]# gluster volume create Storage1 replica 2 transport
>tcp Server1:/exp/br01/brick1 Server2:/exp/br01/brick1
>Server1:/exp/br02/brick2 Server2:/exp/br02/brick2
>
>
>This is the Current configuration BEFORE the migration.. Server3 has
>been Peer Probed successfully but that has been it
>[root at Server1 ~]# gluster --version
>glusterfs 3.6.2 built on Jan 22 2015 12:58:11
>
>[root at Server1 ~]# gluster volume status
>Status of volume: Storage1
>Gluster process                 Port    Online  Pid
>------------------------------------------------------------------------------
>Brick Server1:/exp/br01/brick1  49152   Y       2167
>Brick Server2:/exp/br01/brick1  49152   Y       2192
>Brick Server1:/exp/br02/brick2  49153   Y       2172   <--- this is the
>one that goes missing
>Brick Server2:/exp/br02/brick2  49153   Y       2193
>NFS Server on localhost         2049    Y       2181
>Self-heal Daemon on localhost   N/A     Y       2186
>NFS Server on Server2           2049    Y       2205
>Self-heal Daemon on Server2     N/A     Y       2210
>NFS Server on Server3           2049    Y       6015
>Self-heal Daemon on Server3     N/A     Y       6016
>
>Task Status of Volume Storage1
>------------------------------------------------------------------------------
>There are no active volume tasks
>[root at Server1 ~]# gluster volume info
>
>Volume Name: Storage1
>Type: Distributed-Replicate
>Volume ID: 9616ce42-48bd-4fe3-883f-decd6c4fcd00
>Status: Started
>Number of Bricks: 2 x 2 = 4
>Transport-type: tcp
>Bricks:
>Brick1: Server1:/exp/br01/brick1
>Brick2: Server2:/exp/br01/brick1
>Brick3: Server1:/exp/br02/brick2
>Brick4: Server2:/exp/br02/brick2
>Options Reconfigured:
>diagnostics.brick-log-level: WARNING
>diagnostics.client-log-level: WARNING
>cluster.entry-self-heal: off
>cluster.data-self-heal: off
>cluster.metadata-self-heal: off
>performance.cache-size: 1024MB
>performance.cache-max-file-size: 2MB
>performance.cache-refresh-timeout: 1
>performance.stat-prefetch: off
>performance.read-ahead: on
>performance.quick-read: off
>performance.write-behind-window-size: 4MB
>performance.flush-behind: on
>performance.write-behind: on
>performance.io-thread-count: 32
>performance.io-cache: on
>network.ping-timeout: 2
>nfs.addr-namelookup: off
>performance.strict-write-ordering: on
>[root at Server1 ~]#
>
>
>
>So we start the Migration of the Brick to the new server using the
>replace Brick command
>[root at Server1 ~]# volname=Storage1
>
>[root at Server1 ~]# from=Server1:/exp/br02/brick2
>
>[root at Server1 ~]# to=Server3:/exp/br02/brick2
>
>[root at Server1 ~]# gluster volume replace-brick $volname $from $to start
>All replace-brick commands except commit force are deprecated. Do you
>want to continue? (y/n) y
>volume replace-brick: success: replace-brick started successfully
>ID: 0062d555-e7eb-4ebe-a264-7e0baf6e7546
>
>
>[root at Server1 ~]# gluster volume replace-brick $volname $from $to
>status
>All replace-brick commands except commit force are deprecated. Do you
>want to continue? (y/n) y
>volume replace-brick: success: Number of files migrated = 281  
>Migration complete
>
>At this point everything seems to be in order with no outstanding
>issues.
>
>[root at Server1 ~]# gluster volume status
>Status of volume: Storage1
>Gluster process                 Port    Online  Pid
>------------------------------------------------------------------------------
>Brick Server1:/exp/br01/brick1  49152   Y       2167
>Brick Server2:/exp/br01/brick1  49152   Y       2192
>Brick Server1:/exp/br02/brick2  49153   Y       27557
>Brick Server2:/exp/br02/brick2  49153   Y       2193
>NFS Server on localhost         2049    Y       27562
>Self-heal Daemon on localhost   N/A     Y       2186
>NFS Server on Server2           2049    Y       2205
>Self-heal Daemon on Server2     N/A     Y       2210
>NFS Server on Server3           2049    Y       6015
>Self-heal Daemon on Server3     N/A     Y       6016
>
>Task Status of Volume Storage1
>------------------------------------------------------------------------------
>Task                 : Replace brick
>ID                   : 0062d555-e7eb-4ebe-a264-7e0baf6e7546
>Source Brick         : Server1:/exp/br02/brick2
>Destination Brick    : Server3:/exp/br02/brick2
>Status               : completed
>
>The volume reports that the replace Brick command completed.. so the
>next step is to commit the change
>
>[root at Server1 ~]# gluster volume replace-brick $volname $from $to
>commit
>All replace-brick commands except commit force are deprecated. Do you
>want to continue? (y/n) y
>volume replace-brick: success: replace-brick commit successful
>
>At this point when I take a look at the status I see that the OLD brick
>is now missing (Server1:/exp/br02/brick2) AND I don't see the new
>Brick... WTF... panic!
>
>[root at Server1 ~]# gluster volume status
>Status of volume: Storage1
>Gluster process                                         Port    Online 
>Pid
>------------------------------------------------------------------------------
>Brick Server1:/exp/br01/brick1  49152   Y       2167
>Brick Server2:/exp/br01/brick1  49152   Y       2192
>Brick Server2:/exp/br02/brick2  49153   Y       2193
>NFS Server on localhost         2049    Y       28906
>Self-heal Daemon on localhost   N/A     Y       28911
>NFS Server on Server2           2049    Y       2205
>Self-heal Daemon on Server2     N/A     Y       2210
>NFS Server on Server3           2049    Y       6015
>Self-heal Daemon on Server3     N/A     Y       6016
>
>Task Status of Volume Storage1
>------------------------------------------------------------------------------
>There are no active volume tasks
>
>
>After the commit on Server1 it does not have the Tasks listed
>anymore... yet server2 and server3 see this
>
>[root at Server2 ~]# gluster volume status
>Status of volume: Storage1
>Gluster process                 Port    Online  Pid
>------------------------------------------------------------------------------
>Brick Server1:/exp/br01/brick1  49152   Y       2167
>Brick Server2:/exp/br01/brick1  49152   Y       2192
>Brick Server2:/exp/br02/brick2  49153   Y       2193
>NFS Server on localhost         2049    Y       2205
>Self-heal Daemon on localhost   N/A     Y       2210
>NFS Server on 10.45.16.17       2049    Y       28906
>Self-heal Daemon on 10.45.16.17 N/A     Y       28911
>NFS Server on server3           2049    Y       6015
>Self-heal Daemon on server3     N/A     Y       6016
>
>Task Status of Volume Storage1
>------------------------------------------------------------------------------
>Task                 : Replace brick
>ID                   : 0062d555-e7eb-4ebe-a264-7e0baf6e7546
>Source Brick         : Server1:/exp/br02/brick2
>Destination Brick    : server3:/exp/br02/brick2
>Status               : completed
>
>
>If I navigate the brick on Server3 the brick is NOT empty.. but missing
>A LOT!  It's like the replace brick stopped... and never restarted
>again.
>The replace brick reported back "Number of files migrated = 281  
>Migration complete" but when I look on Server3 Brick I get:
>       [root at Server3 brick2]# find . -type f -print | wc -l
>16
>
>I'm missing 265 files.. (they still exist on the OLD brick.. but how
>can I move it?)
>
>If I try to add the old brick back with another brick on the new server
>as such
>[root at Server1 ~]# gluster volume add-brick Storage1
>Server1:/exp/br02/brick2 Server3:/exp/br01/brick1
>volume add-brick: failed: /exp/br02/brick2 is already part of a volume
>
>Im fearfull of running:
>[root at Server1 ~]# setfattr -n trusted.glusterfs.volume-id -v 0x$(grep
>volume-id /var/lib/glusterd/vols/$volname/info | cut -d= -f2 | sed
>'s/-//g') /exp/br02/brick2
>Although it should allow me to add the brick
>
>Gluster Heal info returns:
>[root at Server2 ~]# gluster volume heal Storage1 info
>Brick Server1:/exp/br01/brick1/
>Number of entries: 0
>
>Brick Server2:/exp/br01/brick1/
>Number of entries: 0
>
>Brick Server1:/exp/br02/brick2
>Status: Transport endpoint is not connected
>
>Brick Server2:/exp/br02/brick2/
>Number of entries: 0
>
>I have restarted glusterd numerous times.
>
>
>at this time I'm not sure where to go from here... I know that the
>Server1:/exp/br02/brick2 still has all the data.. and
>Server3:/exp/br01/brick1 is not complete
>
>How do I actually get the brick to replicate?
>How can I add Server1:/exp/br02/brick2 back into the trusted pool if I
>can't replicate it, or re-add it?
>How can I fix this to get it back into a replicated state between the
>three servers?
>
>Thomas
>
>
>
>
>----DATA----
>
>Gluster volume info at this point
>[root at Server1 ~]# gluster volume info
>
>Volume Name: Storage1
>Type: Distributed-Replicate
>Volume ID: 9616ce42-48bd-4fe3-883f-decd6c4fcd00
>Status: Started
>Number of Bricks: 2 x 2 = 4
>Transport-type: tcp
>Bricks:
>Brick1: Server1:/exp/br01/brick1
>Brick2: Server2:/exp/br01/brick1
>Brick3: server3:/exp/br02/brick2
>Brick4: Server2:/exp/br02/brick2
>Options Reconfigured:
>diagnostics.brick-log-level: WARNING
>diagnostics.client-log-level: WARNING
>cluster.entry-self-heal: off
>cluster.data-self-heal: off
>cluster.metadata-self-heal: off
>performance.cache-size: 1024MB
>performance.cache-max-file-size: 2MB
>performance.cache-refresh-timeout: 1
>performance.stat-prefetch: off
>performance.read-ahead: on
>performance.quick-read: off
>performance.write-behind-window-size: 4MB
>performance.flush-behind: on
>performance.write-behind: on
>performance.io-thread-count: 32
>performance.io-cache: on
>network.ping-timeout: 2
>nfs.addr-namelookup: off
>performance.strict-write-ordering: on
>[root at Server1 ~]#
>
>[root at server3 brick2]# gluster volume heal Storage1 info
>Brick Server1:/exp/br01/brick1/
>Number of entries: 0
>
>Brick Server2:/exp/br01/brick1/
>Number of entries: 0
>
>Brick Server3:/exp/br02/brick2/
>Number of entries: 0
>
>Brick Server2:/exp/br02/brick2/
>Number of entries: 0
>
>
>Gluster LOG ( there are a few errors but I'm not sure how to decipher
>them)
>
>[2015-02-14 06:29:19.862809] I [MSGID: 106005]
>[glusterd-handler.c:4142:__glusterd_brick_rpc_notify] 0-management:
>Brick Server1:/exp/br02/brick2 has disconnected from glusterd.
>[2015-02-14 06:29:19.862836] W [socket.c:611:__socket_rwv]
>0-management: readv on /var/run/7565ec897c6454bd3e2f4800250a7221.socket
>failed (Invalid argument)
>[2015-02-14 06:29:19.862853] I [MSGID: 106006]
>[glusterd-handler.c:4257:__glusterd_nodesvc_rpc_notify] 0-management:
>nfs has disconnected from glusterd.
>[2015-02-14 06:29:19.953762] I [glusterd-pmap.c:227:pmap_registry_bind]
>0-pmap: adding brick /exp/br02/brick2 on port 49153
>[2015-02-14 06:31:12.977450] I
>[glusterd-replace-brick.c:99:__glusterd_handle_replace_brick]
>0-management: Received replace brick req
>[2015-02-14 06:31:12.977495] I
>[glusterd-replace-brick.c:154:__glusterd_handle_replace_brick]
>0-management: Received replace brick status request
>[2015-02-14 06:31:13.048852] I
>[glusterd-replace-brick.c:1412:rb_update_srcbrick_port] 0-: adding
>src-brick port no
>[2015-02-14 06:31:19.588380] I
>[glusterd-replace-brick.c:99:__glusterd_handle_replace_brick]
>0-management: Received replace brick req
>[2015-02-14 06:31:19.588422] I
>[glusterd-replace-brick.c:154:__glusterd_handle_replace_brick]
>0-management: Received replace brick status request
>[2015-02-14 06:31:19.661101] I
>[glusterd-replace-brick.c:1412:rb_update_srcbrick_port] 0-: adding
>src-brick port no
>[2015-02-14 06:31:45.115355] W
>[glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx
>modification failed
>[2015-02-14 06:31:45.118597] I
>[glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management:
>Received status volume req for volume Storage1
>[2015-02-14 06:32:10.956357] I
>[glusterd-replace-brick.c:99:__glusterd_handle_replace_brick]
>0-management: Received replace brick req
>[2015-02-14 06:32:10.956385] I
>[glusterd-replace-brick.c:154:__glusterd_handle_replace_brick]
>0-management: Received replace brick commit request
>[2015-02-14 06:32:11.028472] I
>[glusterd-replace-brick.c:1412:rb_update_srcbrick_port] 0-: adding
>src-brick port no
>[2015-02-14 06:32:12.122552] I
>[glusterd-utils.c:6276:glusterd_nfs_pmap_deregister] 0-: De-registered
>MOUNTV3 successfully
>[2015-02-14 06:32:12.131836] I
>[glusterd-utils.c:6281:glusterd_nfs_pmap_deregister] 0-: De-registered
>MOUNTV1 successfully
>[2015-02-14 06:32:12.141107] I
>[glusterd-utils.c:6286:glusterd_nfs_pmap_deregister] 0-: De-registered
>NFSV3 successfully
>[2015-02-14 06:32:12.150375] I
>[glusterd-utils.c:6291:glusterd_nfs_pmap_deregister] 0-: De-registered
>NLM v4 successfully
>[2015-02-14 06:32:12.159630] I
>[glusterd-utils.c:6296:glusterd_nfs_pmap_deregister] 0-: De-registered
>NLM v1 successfully
>[2015-02-14 06:32:12.168889] I
>[glusterd-utils.c:6301:glusterd_nfs_pmap_deregister] 0-: De-registered
>ACL v3 successfully
>[2015-02-14 06:32:13.254689] I
>[rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting
>frame-timeout to 600
>[2015-02-14 06:32:13.254799] W [socket.c:2992:socket_connect]
>0-management: Ignore failed connection attempt on , (No such file or
>directory)
>[2015-02-14 06:32:13.257790] I
>[rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting
>frame-timeout to 600
>[2015-02-14 06:32:13.257908] W [socket.c:2992:socket_connect]
>0-management: Ignore failed connection attempt on , (No such file or
>directory)
>[2015-02-14 06:32:13.258031] W [socket.c:611:__socket_rwv]
>0-socket.management: writev on 127.0.0.1:1019 failed (Broken pipe)
>[2015-02-14 06:32:13.258111] W [socket.c:611:__socket_rwv]
>0-socket.management: writev on 127.0.0.1:1021 failed (Broken pipe)
>[2015-02-14 06:32:13.258130] W [socket.c:611:__socket_rwv]
>0-socket.management: writev on 10.45.16.17:1018 failed (Broken pipe)
>[2015-02-14 06:32:13.711948] I [mem-pool.c:545:mem_pool_destroy]
>0-management: size=588 max=0 total=0
>[2015-02-14 06:32:13.711967] I [mem-pool.c:545:mem_pool_destroy]
>0-management: size=124 max=0 total=0
>[2015-02-14 06:32:13.712008] I [mem-pool.c:545:mem_pool_destroy]
>0-management: size=588 max=0 total=0
>[2015-02-14 06:32:13.712021] I [mem-pool.c:545:mem_pool_destroy]
>0-management: size=124 max=0 total=0
>[2015-02-14 06:32:13.731311] I [mem-pool.c:545:mem_pool_destroy]
>0-management: size=588 max=0 total=0
>[2015-02-14 06:32:13.731326] I [mem-pool.c:545:mem_pool_destroy]
>0-management: size=124 max=0 total=0
>[2015-02-14 06:32:13.731356] I
>[glusterd-pmap.c:271:pmap_registry_remove] 0-pmap: removing brick
>/exp/br02/brick2 on port 49153
>[2015-02-14 06:32:13.823129] I [socket.c:2344:socket_event_handler]
>0-transport: disconnecting now
>[2015-02-14 06:32:13.840668] W [socket.c:611:__socket_rwv]
>0-management: readv on /var/run/7565ec897c6454bd3e2f4800250a7221.socket
>failed (Invalid argument)
>[2015-02-14 06:32:13.840693] I [MSGID: 106006]
>[glusterd-handler.c:4257:__glusterd_nodesvc_rpc_notify] 0-management:
>nfs has disconnected from glusterd.
>[2015-02-14 06:32:13.840712] W [socket.c:611:__socket_rwv]
>0-management: readv on /var/run/ac4c043d3c6a2e5159c86e8c75c51829.socket
>failed (Invalid argument)
>[2015-02-14 06:32:13.840728] I [MSGID: 106006]
>[glusterd-handler.c:4257:__glusterd_nodesvc_rpc_notify] 0-management:
>glustershd has disconnected from glusterd.
>[2015-02-14 06:32:14.729667] E
>[glusterd-rpc-ops.c:1169:__glusterd_commit_op_cbk] 0-management:
>Received commit RJT from uuid: 294aa603-ec24-44b9-864b-0fe743faa8d9
>[2015-02-14 06:32:14.743623] E
>[glusterd-rpc-ops.c:1169:__glusterd_commit_op_cbk] 0-management:
>Received commit RJT from uuid: 92aabaf4-4b6c-48da-82b6-c465aff2ec6d
>[2015-02-14 06:32:18.762975] W
>[glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx
>modification failed
>[2015-02-14 06:32:18.764552] I
>[glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management:
>Received status volume req for volume Storage1
>[2015-02-14 06:32:18.769051] E
>[glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status]
>0-management: Local tasks count (0) and remote tasks count (1) do not
>match. Not aggregating tasks status.
>[2015-02-14 06:32:18.769070] E
>[glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed
>to aggregate response from  node/brick
>[2015-02-14 06:32:18.771095] E
>[glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status]
>0-management: Local tasks count (0) and remote tasks count (1) do not
>match. Not aggregating tasks status.
>[2015-02-14 06:32:18.771108] E
>[glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed
>to aggregate response from  node/brick
>[2015-02-14 06:32:48.570796] W
>[glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx
>modification failed
>[2015-02-14 06:32:48.572352] I
>[glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management:
>Received status volume req for volume Storage1
>[2015-02-14 06:32:48.576899] E
>[glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status]
>0-management: Local tasks count (0) and remote tasks count (1) do not
>match. Not aggregating tasks status.
>[2015-02-14 06:32:48.576918] E
>[glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed
>to aggregate response from  node/brick
>[2015-02-14 06:32:48.578982] E
>[glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status]
>0-management: Local tasks count (0) and remote tasks count (1) do not
>match. Not aggregating tasks status.
>[2015-02-14 06:32:48.579001] E
>[glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed
>to aggregate response from  node/brick
>[2015-02-14 06:36:57.840738] W
>[glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx
>modification failed
>[2015-02-14 06:36:57.842370] I
>[glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management:
>Received status volume req for volume Storage1
>[2015-02-14 06:36:57.846919] E
>[glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status]
>0-management: Local tasks count (0) and remote tasks count (1) do not
>match. Not aggregating tasks status.
>[2015-02-14 06:36:57.846941] E
>[glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed
>to aggregate response from  node/brick
>[2015-02-14 06:36:57.849026] E
>[glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status]
>0-management: Local tasks count (0) and remote tasks count (1) do not
>match. Not aggregating tasks status.
>[2015-02-14 06:36:57.849046] E
>[glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed
>to aggregate response from  node/brick
>[2015-02-14 06:37:20.208081] W
>[glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx
>modification failed
>[2015-02-14 06:37:20.211279] I
>[glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management:
>Received status volume req for volume Storage1
>[2015-02-14 06:37:20.215792] E
>[glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status]
>0-management: Local tasks count (0) and remote tasks count (1) do not
>match. Not aggregating tasks status.
>[2015-02-14 06:37:20.215809] E
>[glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed
>to aggregate response from  node/brick
>[2015-02-14 06:37:20.216295] E
>[glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status]
>0-management: Local tasks count (0) and remote tasks count (1) do not
>match. Not aggregating tasks status.
>[2015-02-14 06:37:20.216308] E
>[glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed
>to aggregate response from  node/brick
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://www.gluster.org/mailman/listinfo/gluster-users
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150215/828ac834/attachment.html>

Gluster users - Feb 2015 - Gluster replicate-brick issues (Distrubuted-Replica)

[Gluster-users] Gluster replicate-brick issues (Distrubuted-Replica)

[Gluster-users] Gluster replicate-brick issues (Distrubuted-Replica)

[Gluster-users] Gluster replicate-brick issues (Distrubuted-Replica)