Thomas Holkenbrink
2015-Feb-14 08:58 UTC
[Gluster-users] Gluster replicate-brick issues (Distrubuted-Replica)
We have tried to migrate a Brick from one server to another using the following commands. But the data is NOT being replicated... and the BRICK is not showing up anymore. Gluster still appears to be working but the Bricks are not balanced and I need to add the other Brick for Server3 that I don't want to do until after Server1:Brick2 gets replicated. This is the command to create the Original Volume: [root at Server1 ~]# gluster volume create Storage1 replica 2 transport tcp Server1:/exp/br01/brick1 Server2:/exp/br01/brick1 Server1:/exp/br02/brick2 Server2:/exp/br02/brick2 This is the Current configuration BEFORE the migration.. Server3 has been Peer Probed successfully but that has been it [root at Server1 ~]# gluster --version glusterfs 3.6.2 built on Jan 22 2015 12:58:11 [root at Server1 ~]# gluster volume status Status of volume: Storage1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick Server1:/exp/br01/brick1 49152 Y 2167 Brick Server2:/exp/br01/brick1 49152 Y 2192 Brick Server1:/exp/br02/brick2 49153 Y 2172 <--- this is the one that goes missing Brick Server2:/exp/br02/brick2 49153 Y 2193 NFS Server on localhost 2049 Y 2181 Self-heal Daemon on localhost N/A Y 2186 NFS Server on Server2 2049 Y 2205 Self-heal Daemon on Server2 N/A Y 2210 NFS Server on Server3 2049 Y 6015 Self-heal Daemon on Server3 N/A Y 6016 Task Status of Volume Storage1 ------------------------------------------------------------------------------ There are no active volume tasks [root at Server1 ~]# gluster volume info Volume Name: Storage1 Type: Distributed-Replicate Volume ID: 9616ce42-48bd-4fe3-883f-decd6c4fcd00 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: Server1:/exp/br01/brick1 Brick2: Server2:/exp/br01/brick1 Brick3: Server1:/exp/br02/brick2 Brick4: Server2:/exp/br02/brick2 Options Reconfigured: diagnostics.brick-log-level: WARNING diagnostics.client-log-level: WARNING cluster.entry-self-heal: off cluster.data-self-heal: off cluster.metadata-self-heal: off performance.cache-size: 1024MB performance.cache-max-file-size: 2MB performance.cache-refresh-timeout: 1 performance.stat-prefetch: off performance.read-ahead: on performance.quick-read: off performance.write-behind-window-size: 4MB performance.flush-behind: on performance.write-behind: on performance.io-thread-count: 32 performance.io-cache: on network.ping-timeout: 2 nfs.addr-namelookup: off performance.strict-write-ordering: on [root at Server1 ~]# So we start the Migration of the Brick to the new server using the replace Brick command [root at Server1 ~]# volname=Storage1 [root at Server1 ~]# from=Server1:/exp/br02/brick2 [root at Server1 ~]# to=Server3:/exp/br02/brick2 [root at Server1 ~]# gluster volume replace-brick $volname $from $to start All replace-brick commands except commit force are deprecated. Do you want to continue? (y/n) y volume replace-brick: success: replace-brick started successfully ID: 0062d555-e7eb-4ebe-a264-7e0baf6e7546 [root at Server1 ~]# gluster volume replace-brick $volname $from $to status All replace-brick commands except commit force are deprecated. Do you want to continue? (y/n) y volume replace-brick: success: Number of files migrated = 281 Migration complete At this point everything seems to be in order with no outstanding issues. [root at Server1 ~]# gluster volume status Status of volume: Storage1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick Server1:/exp/br01/brick1 49152 Y 2167 Brick Server2:/exp/br01/brick1 49152 Y 2192 Brick Server1:/exp/br02/brick2 49153 Y 27557 Brick Server2:/exp/br02/brick2 49153 Y 2193 NFS Server on localhost 2049 Y 27562 Self-heal Daemon on localhost N/A Y 2186 NFS Server on Server2 2049 Y 2205 Self-heal Daemon on Server2 N/A Y 2210 NFS Server on Server3 2049 Y 6015 Self-heal Daemon on Server3 N/A Y 6016 Task Status of Volume Storage1 ------------------------------------------------------------------------------ Task : Replace brick ID : 0062d555-e7eb-4ebe-a264-7e0baf6e7546 Source Brick : Server1:/exp/br02/brick2 Destination Brick : Server3:/exp/br02/brick2 Status : completed The volume reports that the replace Brick command completed.. so the next step is to commit the change [root at Server1 ~]# gluster volume replace-brick $volname $from $to commit All replace-brick commands except commit force are deprecated. Do you want to continue? (y/n) y volume replace-brick: success: replace-brick commit successful At this point when I take a look at the status I see that the OLD brick is now missing (Server1:/exp/br02/brick2) AND I don't see the new Brick... WTF... panic! [root at Server1 ~]# gluster volume status Status of volume: Storage1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick Server1:/exp/br01/brick1 49152 Y 2167 Brick Server2:/exp/br01/brick1 49152 Y 2192 Brick Server2:/exp/br02/brick2 49153 Y 2193 NFS Server on localhost 2049 Y 28906 Self-heal Daemon on localhost N/A Y 28911 NFS Server on Server2 2049 Y 2205 Self-heal Daemon on Server2 N/A Y 2210 NFS Server on Server3 2049 Y 6015 Self-heal Daemon on Server3 N/A Y 6016 Task Status of Volume Storage1 ------------------------------------------------------------------------------ There are no active volume tasks After the commit on Server1 it does not have the Tasks listed anymore... yet server2 and server3 see this [root at Server2 ~]# gluster volume status Status of volume: Storage1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick Server1:/exp/br01/brick1 49152 Y 2167 Brick Server2:/exp/br01/brick1 49152 Y 2192 Brick Server2:/exp/br02/brick2 49153 Y 2193 NFS Server on localhost 2049 Y 2205 Self-heal Daemon on localhost N/A Y 2210 NFS Server on 10.45.16.17 2049 Y 28906 Self-heal Daemon on 10.45.16.17 N/A Y 28911 NFS Server on server3 2049 Y 6015 Self-heal Daemon on server3 N/A Y 6016 Task Status of Volume Storage1 ------------------------------------------------------------------------------ Task : Replace brick ID : 0062d555-e7eb-4ebe-a264-7e0baf6e7546 Source Brick : Server1:/exp/br02/brick2 Destination Brick : server3:/exp/br02/brick2 Status : completed If I navigate the brick on Server3 the brick is NOT empty.. but missing A LOT! It's like the replace brick stopped... and never restarted again. The replace brick reported back "Number of files migrated = 281 Migration complete" but when I look on Server3 Brick I get: [root at Server3 brick2]# find . -type f -print | wc -l 16 I'm missing 265 files.. (they still exist on the OLD brick.. but how can I move it?) If I try to add the old brick back with another brick on the new server as such [root at Server1 ~]# gluster volume add-brick Storage1 Server1:/exp/br02/brick2 Server3:/exp/br01/brick1 volume add-brick: failed: /exp/br02/brick2 is already part of a volume Im fearfull of running: [root at Server1 ~]# setfattr -n trusted.glusterfs.volume-id -v 0x$(grep volume-id /var/lib/glusterd/vols/$volname/info | cut -d= -f2 | sed 's/-//g') /exp/br02/brick2 Although it should allow me to add the brick Gluster Heal info returns: [root at Server2 ~]# gluster volume heal Storage1 info Brick Server1:/exp/br01/brick1/ Number of entries: 0 Brick Server2:/exp/br01/brick1/ Number of entries: 0 Brick Server1:/exp/br02/brick2 Status: Transport endpoint is not connected Brick Server2:/exp/br02/brick2/ Number of entries: 0 I have restarted glusterd numerous times. at this time I'm not sure where to go from here... I know that the Server1:/exp/br02/brick2 still has all the data.. and Server3:/exp/br01/brick1 is not complete How do I actually get the brick to replicate? How can I add Server1:/exp/br02/brick2 back into the trusted pool if I can't replicate it, or re-add it? How can I fix this to get it back into a replicated state between the three servers? Thomas ----DATA---- Gluster volume info at this point [root at Server1 ~]# gluster volume info Volume Name: Storage1 Type: Distributed-Replicate Volume ID: 9616ce42-48bd-4fe3-883f-decd6c4fcd00 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: Server1:/exp/br01/brick1 Brick2: Server2:/exp/br01/brick1 Brick3: server3:/exp/br02/brick2 Brick4: Server2:/exp/br02/brick2 Options Reconfigured: diagnostics.brick-log-level: WARNING diagnostics.client-log-level: WARNING cluster.entry-self-heal: off cluster.data-self-heal: off cluster.metadata-self-heal: off performance.cache-size: 1024MB performance.cache-max-file-size: 2MB performance.cache-refresh-timeout: 1 performance.stat-prefetch: off performance.read-ahead: on performance.quick-read: off performance.write-behind-window-size: 4MB performance.flush-behind: on performance.write-behind: on performance.io-thread-count: 32 performance.io-cache: on network.ping-timeout: 2 nfs.addr-namelookup: off performance.strict-write-ordering: on [root at Server1 ~]# [root at server3 brick2]# gluster volume heal Storage1 info Brick Server1:/exp/br01/brick1/ Number of entries: 0 Brick Server2:/exp/br01/brick1/ Number of entries: 0 Brick Server3:/exp/br02/brick2/ Number of entries: 0 Brick Server2:/exp/br02/brick2/ Number of entries: 0 Gluster LOG ( there are a few errors but I'm not sure how to decipher them) [2015-02-14 06:29:19.862809] I [MSGID: 106005] [glusterd-handler.c:4142:__glusterd_brick_rpc_notify] 0-management: Brick Server1:/exp/br02/brick2 has disconnected from glusterd. [2015-02-14 06:29:19.862836] W [socket.c:611:__socket_rwv] 0-management: readv on /var/run/7565ec897c6454bd3e2f4800250a7221.socket failed (Invalid argument) [2015-02-14 06:29:19.862853] I [MSGID: 106006] [glusterd-handler.c:4257:__glusterd_nodesvc_rpc_notify] 0-management: nfs has disconnected from glusterd. [2015-02-14 06:29:19.953762] I [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /exp/br02/brick2 on port 49153 [2015-02-14 06:31:12.977450] I [glusterd-replace-brick.c:99:__glusterd_handle_replace_brick] 0-management: Received replace brick req [2015-02-14 06:31:12.977495] I [glusterd-replace-brick.c:154:__glusterd_handle_replace_brick] 0-management: Received replace brick status request [2015-02-14 06:31:13.048852] I [glusterd-replace-brick.c:1412:rb_update_srcbrick_port] 0-: adding src-brick port no [2015-02-14 06:31:19.588380] I [glusterd-replace-brick.c:99:__glusterd_handle_replace_brick] 0-management: Received replace brick req [2015-02-14 06:31:19.588422] I [glusterd-replace-brick.c:154:__glusterd_handle_replace_brick] 0-management: Received replace brick status request [2015-02-14 06:31:19.661101] I [glusterd-replace-brick.c:1412:rb_update_srcbrick_port] 0-: adding src-brick port no [2015-02-14 06:31:45.115355] W [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx modification failed [2015-02-14 06:31:45.118597] I [glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management: Received status volume req for volume Storage1 [2015-02-14 06:32:10.956357] I [glusterd-replace-brick.c:99:__glusterd_handle_replace_brick] 0-management: Received replace brick req [2015-02-14 06:32:10.956385] I [glusterd-replace-brick.c:154:__glusterd_handle_replace_brick] 0-management: Received replace brick commit request [2015-02-14 06:32:11.028472] I [glusterd-replace-brick.c:1412:rb_update_srcbrick_port] 0-: adding src-brick port no [2015-02-14 06:32:12.122552] I [glusterd-utils.c:6276:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV3 successfully [2015-02-14 06:32:12.131836] I [glusterd-utils.c:6281:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV1 successfully [2015-02-14 06:32:12.141107] I [glusterd-utils.c:6286:glusterd_nfs_pmap_deregister] 0-: De-registered NFSV3 successfully [2015-02-14 06:32:12.150375] I [glusterd-utils.c:6291:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v4 successfully [2015-02-14 06:32:12.159630] I [glusterd-utils.c:6296:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v1 successfully [2015-02-14 06:32:12.168889] I [glusterd-utils.c:6301:glusterd_nfs_pmap_deregister] 0-: De-registered ACL v3 successfully [2015-02-14 06:32:13.254689] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2015-02-14 06:32:13.254799] W [socket.c:2992:socket_connect] 0-management: Ignore failed connection attempt on , (No such file or directory) [2015-02-14 06:32:13.257790] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2015-02-14 06:32:13.257908] W [socket.c:2992:socket_connect] 0-management: Ignore failed connection attempt on , (No such file or directory) [2015-02-14 06:32:13.258031] W [socket.c:611:__socket_rwv] 0-socket.management: writev on 127.0.0.1:1019 failed (Broken pipe) [2015-02-14 06:32:13.258111] W [socket.c:611:__socket_rwv] 0-socket.management: writev on 127.0.0.1:1021 failed (Broken pipe) [2015-02-14 06:32:13.258130] W [socket.c:611:__socket_rwv] 0-socket.management: writev on 10.45.16.17:1018 failed (Broken pipe) [2015-02-14 06:32:13.711948] I [mem-pool.c:545:mem_pool_destroy] 0-management: size=588 max=0 total=0 [2015-02-14 06:32:13.711967] I [mem-pool.c:545:mem_pool_destroy] 0-management: size=124 max=0 total=0 [2015-02-14 06:32:13.712008] I [mem-pool.c:545:mem_pool_destroy] 0-management: size=588 max=0 total=0 [2015-02-14 06:32:13.712021] I [mem-pool.c:545:mem_pool_destroy] 0-management: size=124 max=0 total=0 [2015-02-14 06:32:13.731311] I [mem-pool.c:545:mem_pool_destroy] 0-management: size=588 max=0 total=0 [2015-02-14 06:32:13.731326] I [mem-pool.c:545:mem_pool_destroy] 0-management: size=124 max=0 total=0 [2015-02-14 06:32:13.731356] I [glusterd-pmap.c:271:pmap_registry_remove] 0-pmap: removing brick /exp/br02/brick2 on port 49153 [2015-02-14 06:32:13.823129] I [socket.c:2344:socket_event_handler] 0-transport: disconnecting now [2015-02-14 06:32:13.840668] W [socket.c:611:__socket_rwv] 0-management: readv on /var/run/7565ec897c6454bd3e2f4800250a7221.socket failed (Invalid argument) [2015-02-14 06:32:13.840693] I [MSGID: 106006] [glusterd-handler.c:4257:__glusterd_nodesvc_rpc_notify] 0-management: nfs has disconnected from glusterd. [2015-02-14 06:32:13.840712] W [socket.c:611:__socket_rwv] 0-management: readv on /var/run/ac4c043d3c6a2e5159c86e8c75c51829.socket failed (Invalid argument) [2015-02-14 06:32:13.840728] I [MSGID: 106006] [glusterd-handler.c:4257:__glusterd_nodesvc_rpc_notify] 0-management: glustershd has disconnected from glusterd. [2015-02-14 06:32:14.729667] E [glusterd-rpc-ops.c:1169:__glusterd_commit_op_cbk] 0-management: Received commit RJT from uuid: 294aa603-ec24-44b9-864b-0fe743faa8d9 [2015-02-14 06:32:14.743623] E [glusterd-rpc-ops.c:1169:__glusterd_commit_op_cbk] 0-management: Received commit RJT from uuid: 92aabaf4-4b6c-48da-82b6-c465aff2ec6d [2015-02-14 06:32:18.762975] W [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx modification failed [2015-02-14 06:32:18.764552] I [glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management: Received status volume req for volume Storage1 [2015-02-14 06:32:18.769051] E [glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] 0-management: Local tasks count (0) and remote tasks count (1) do not match. Not aggregating tasks status. [2015-02-14 06:32:18.769070] E [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from node/brick [2015-02-14 06:32:18.771095] E [glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] 0-management: Local tasks count (0) and remote tasks count (1) do not match. Not aggregating tasks status. [2015-02-14 06:32:18.771108] E [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from node/brick [2015-02-14 06:32:48.570796] W [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx modification failed [2015-02-14 06:32:48.572352] I [glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management: Received status volume req for volume Storage1 [2015-02-14 06:32:48.576899] E [glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] 0-management: Local tasks count (0) and remote tasks count (1) do not match. Not aggregating tasks status. [2015-02-14 06:32:48.576918] E [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from node/brick [2015-02-14 06:32:48.578982] E [glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] 0-management: Local tasks count (0) and remote tasks count (1) do not match. Not aggregating tasks status. [2015-02-14 06:32:48.579001] E [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from node/brick [2015-02-14 06:36:57.840738] W [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx modification failed [2015-02-14 06:36:57.842370] I [glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management: Received status volume req for volume Storage1 [2015-02-14 06:36:57.846919] E [glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] 0-management: Local tasks count (0) and remote tasks count (1) do not match. Not aggregating tasks status. [2015-02-14 06:36:57.846941] E [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from node/brick [2015-02-14 06:36:57.849026] E [glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] 0-management: Local tasks count (0) and remote tasks count (1) do not match. Not aggregating tasks status. [2015-02-14 06:36:57.849046] E [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from node/brick [2015-02-14 06:37:20.208081] W [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx modification failed [2015-02-14 06:37:20.211279] I [glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management: Received status volume req for volume Storage1 [2015-02-14 06:37:20.215792] E [glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] 0-management: Local tasks count (0) and remote tasks count (1) do not match. Not aggregating tasks status. [2015-02-14 06:37:20.215809] E [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from node/brick [2015-02-14 06:37:20.216295] E [glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] 0-management: Local tasks count (0) and remote tasks count (1) do not match. Not aggregating tasks status. [2015-02-14 06:37:20.216308] E [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from node/brick -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150214/0a46fa1e/attachment.html>
Subrata Ghosh
2015-Feb-15 05:41 UTC
[Gluster-users] Gluster replicate-brick issues (Distrubuted-Replica)
Please try to use "commit force" option and see whether it works as gluster suggested the other only replace-brick command has been depreciated. [root at Server1 ~]# gluster volume replace-brick $volname $from $to start All replace-brick commands except commit force are deprecated. Do you want to continue? (y/n) y volume replace-brick: success: replace-brick started successfully ID: 0062d555-e7eb-4ebe-a264-7e0baf6e7546 You could try following options , used to resolve our replace-brcik scenario looks like close to your case "gluster heal info" has some issues. That I requested for a clarification fewdays back. # gluster volume replace-bricks $vol_name $old_brick $new_brick commit force # gluster volume heal $vol_name full #gluster volume $vol_name info --> this shows that the Numer of files=1, even though the file is already healed. Regards, Subrata On 02/14/2015 02:28 PM, Thomas Holkenbrink wrote:> > We have tried to migrate a Brick from one server to another using the > following commands. But the data is NOT being replicated... and the > BRICK is not showing up anymore. > > Gluster still appears to be working but the Bricks are not balanced > and I need to add the other Brick for Server3 that I don't want to do > until after Server1:Brick2 gets replicated. > > This is the command to create the Original Volume: > > [root at Server1 ~]# gluster volume create Storage1 replica 2 transport > tcp Server1:/exp/br01/brick1 Server2:/exp/br01/brick1 > Server1:/exp/br02/brick2 Server2:/exp/br02/brick2 > > This is the Current configuration BEFORE the migration.. Server3 has > been Peer Probed successfully but that has been it > > [root at Server1 ~]# gluster --version > > glusterfs 3.6.2 built on Jan 22 2015 12:58:11 > > [root at Server1 ~]# gluster volume status > > Status of volume: Storage1 > > Gluster process Port Online Pid > > ------------------------------------------------------------------------------ > > Brick Server1:/exp/br01/brick1 49152 Y 2167 > > Brick Server2:/exp/br01/brick1 49152 Y 2192 > > *Brick Server1:/exp/br02/brick2 49153 Y 2172 <--- this is > the one that goes missing* > > Brick Server2:/exp/br02/brick2 49153 Y 2193 > > NFS Server on localhost 2049 Y 2181 > > Self-heal Daemon on localhost N/A Y 2186 > > NFS Server on Server2 2049 Y 2205 > > Self-heal Daemon on Server2 N/A Y 2210 > > NFS Server on Server3 2049 Y 6015 > > Self-heal Daemon on Server3 N/A Y 6016 > > Task Status of Volume Storage1 > > ------------------------------------------------------------------------------ > > There are no active volume tasks > > [root at Server1 ~]# gluster volume info > > Volume Name: Storage1 > > Type: Distributed-Replicate > > Volume ID: 9616ce42-48bd-4fe3-883f-decd6c4fcd00 > > Status: Started > > Number of Bricks: 2 x 2 = 4 > > Transport-type: tcp > > Bricks: > > Brick1: Server1:/exp/br01/brick1 > > Brick2: Server2:/exp/br01/brick1 > > Brick3: Server1:/exp/br02/brick2 > > Brick4: Server2:/exp/br02/brick2 > > Options Reconfigured: > > diagnostics.brick-log-level: WARNING > > diagnostics.client-log-level: WARNING > > cluster.entry-self-heal: off > > cluster.data-self-heal: off > > cluster.metadata-self-heal: off > > performance.cache-size: 1024MB > > performance.cache-max-file-size: 2MB > > performance.cache-refresh-timeout: 1 > > performance.stat-prefetch: off > > performance.read-ahead: on > > performance.quick-read: off > > performance.write-behind-window-size: 4MB > > performance.flush-behind: on > > performance.write-behind: on > > performance.io-thread-count: 32 > > performance.io-cache: on > > network.ping-timeout: 2 > > nfs.addr-namelookup: off > > performance.strict-write-ordering: on > > [root at Server1 ~]# > > So we start the Migration of the Brick to the new server using the > replace Brick command > > [root at Server1 ~]# volname=Storage1 > > [root at Server1 ~]# from=Server1:/exp/br02/brick2 > > [root at Server1 ~]# to=Server3:/exp/br02/brick2 > > [root at Server1 ~]# gluster volume replace-brick $volname $from $to start > > All replace-brick commands except commit force are deprecated. Do you > want to continue? (y/n) y > > volume replace-brick: success: replace-brick started successfully > > ID: 0062d555-e7eb-4ebe-a264-7e0baf6e7546 > > [root at Server1 ~]# gluster volume replace-brick $volname $from $to status > > All replace-brick commands except commit force are deprecated. Do you > want to continue? (y/n) y > > volume replace-brick: success: Number of files migrated = 281 > Migration complete > > At this point everything seems to be in order with no outstanding issues. > > [root at Server1 ~]# gluster volume status > > Status of volume: Storage1 > > Gluster process Port Online Pid > > ------------------------------------------------------------------------------ > > Brick Server1:/exp/br01/brick1 49152 Y 2167 > > Brick Server2:/exp/br01/brick1 49152 Y 2192 > > Brick Server1:/exp/br02/brick2 49153 Y 27557 > > Brick Server2:/exp/br02/brick2 49153 Y 2193 > > NFS Server on localhost 2049 Y 27562 > > Self-heal Daemon on localhost N/A Y 2186 > > NFS Server on Server2 2049 Y 2205 > > Self-heal Daemon on Server2 N/A Y 2210 > > NFS Server on Server3 2049 Y 6015 > > Self-heal Daemon on Server3 N/A Y 6016 > > Task Status of Volume Storage1 > > ------------------------------------------------------------------------------ > > Task : Replace brick > > ID : 0062d555-e7eb-4ebe-a264-7e0baf6e7546 > > Source Brick : Server1:/exp/br02/brick2 > > Destination Brick : Server3:/exp/br02/brick2 > > Status : completed > > The volume reports that the replace Brick command completed.. so the > next step is to commit the change > > [root at Server1 ~]# gluster volume replace-brick $volname $from $to commit > > All replace-brick commands except commit force are deprecated. Do you > want to continue? (y/n) y > > volume replace-brick: success: replace-brick commit successful > > At this point when I take a look at the status I see that the OLD > brick is now missing (Server1:/exp/br02/brick2) AND I don't see the > new Brick... WTF... panic! > > [root at Server1 ~]# gluster volume status > > Status of volume: Storage1 > > Gluster process Port Online Pid > > ------------------------------------------------------------------------------ > > Brick Server1:/exp/br01/brick1 49152 Y 2167 > > Brick Server2:/exp/br01/brick1 49152 Y 2192 > > Brick Server2:/exp/br02/brick2 49153 Y 2193 > > NFS Server on localhost 2049 Y 28906 > > Self-heal Daemon on localhost N/A Y 28911 > > NFS Server on Server2 2049 Y 2205 > > Self-heal Daemon on Server2 N/A Y 2210 > > NFS Server on Server3 2049 Y 6015 > > Self-heal Daemon on Server3 N/A Y 6016 > > Task Status of Volume Storage1 > > ------------------------------------------------------------------------------ > > There are no active volume tasks > > After the commit on Server1 it does not have the Tasks listed > anymore... yet server2 and server3 see this > > [root at Server2 ~]# gluster volume status > > Status of volume: Storage1 > > Gluster process Port Online Pid > > ------------------------------------------------------------------------------ > > Brick Server1:/exp/br01/brick1 49152 Y 2167 > > Brick Server2:/exp/br01/brick1 49152 Y 2192 > > Brick Server2:/exp/br02/brick2 49153 Y 2193 > > NFS Server on localhost 2049 Y 2205 > > Self-heal Daemon on localhost N/A Y 2210 > > NFS Server on 10.45.16.17 2049 Y 28906 > > Self-heal Daemon on 10.45.16.17 N/A Y 28911 > > NFS Server on server3 2049 Y 6015 > > Self-heal Daemon on server3 N/A Y 6016 > > Task Status of Volume Storage1 > > ------------------------------------------------------------------------------ > > Task : Replace brick > > ID : 0062d555-e7eb-4ebe-a264-7e0baf6e7546 > > Source Brick : Server1:/exp/br02/brick2 > > Destination Brick : server3:/exp/br02/brick2 > > Status : completed > > If I navigate the brick on Server3 the brick is NOT empty.. but > missing A LOT! It's like the replace brick stopped... and never > restarted again. > > The replace brick reported back "Number of files migrated = 281 > Migration complete" but when I look on Server3 Brick I get: > > [root at Server3 brick2]# find . -type f -print | wc -l > > 16 > > I'm missing 265 files.. (they still exist on the OLD brick.. but how > can I move it?) > > If I try to add the old brick back with another brick on the new > server as such > > [root at Server1 ~]# gluster volume add-brick Storage1 > Server1:/exp/br02/brick2 Server3:/exp/br01/brick1 > > volume add-brick: failed: /exp/br02/brick2 is already part of a volume > > Im fearfull of running: > > [root at Server1 ~]# setfattr -n trusted.glusterfs.volume-id -v 0x$(grep > volume-id /var/lib/glusterd/vols/$volname/info | cut -d= -f2 | sed > 's/-//g') /exp/br02/brick2 > > Although it should allow me to add the brick > > Gluster Heal info returns: > > [root at Server2 ~]# gluster volume heal Storage1 info > > Brick Server1:/exp/br01/brick1/ > > Number of entries: 0 > > Brick Server2:/exp/br01/brick1/ > > Number of entries: 0 > > Brick Server1:/exp/br02/brick2 > > Status: Transport endpoint is not connected > > Brick Server2:/exp/br02/brick2/ > > Number of entries: 0 > > I have restarted glusterd numerous times. > > at this time I'm not sure where to go from here... I know that the > Server1:/exp/br02/brick2 still has all the data.. and > Server3:/exp/br01/brick1 is not complete > > How do I actually get the brick to replicate? > > How can I add Server1:/exp/br02/brick2 back into the trusted pool if I > can't replicate it, or re-add it? > > How can I fix this to get it back into a replicated state between the > three servers? > > Thomas > > ----DATA---- > > Gluster volume info at this point > > [root at Server1 ~]# gluster volume info > > Volume Name: Storage1 > > Type: Distributed-Replicate > > Volume ID: 9616ce42-48bd-4fe3-883f-decd6c4fcd00 > > Status: Started > > Number of Bricks: 2 x 2 = 4 > > Transport-type: tcp > > Bricks: > > Brick1: Server1:/exp/br01/brick1 > > Brick2: Server2:/exp/br01/brick1 > > Brick3: server3:/exp/br02/brick2 > > Brick4: Server2:/exp/br02/brick2 > > Options Reconfigured: > > diagnostics.brick-log-level: WARNING > > diagnostics.client-log-level: WARNING > > cluster.entry-self-heal: off > > cluster.data-self-heal: off > > cluster.metadata-self-heal: off > > performance.cache-size: 1024MB > > performance.cache-max-file-size: 2MB > > performance.cache-refresh-timeout: 1 > > performance.stat-prefetch: off > > performance.read-ahead: on > > performance.quick-read: off > > performance.write-behind-window-size: 4MB > > performance.flush-behind: on > > performance.write-behind: on > > performance.io-thread-count: 32 > > performance.io-cache: on > > network.ping-timeout: 2 > > nfs.addr-namelookup: off > > performance.strict-write-ordering: on > > [root at Server1 ~]# > > [root at server3 brick2]# gluster volume heal Storage1 info > > Brick Server1:/exp/br01/brick1/ > > Number of entries: 0 > > Brick Server2:/exp/br01/brick1/ > > Number of entries: 0 > > Brick Server3:/exp/br02/brick2/ > > Number of entries: 0 > > Brick Server2:/exp/br02/brick2/ > > Number of entries: 0 > > Gluster LOG ( there are a few errors but I'm not sure how to decipher > them) > > [2015-02-14 06:29:19.862809] I [MSGID: 106005] > [glusterd-handler.c:4142:__glusterd_brick_rpc_notify] 0-management: > Brick Server1:/exp/br02/brick2 has disconnected from glusterd. > > [2015-02-14 06:29:19.862836] W [socket.c:611:__socket_rwv] > 0-management: readv on > /var/run/7565ec897c6454bd3e2f4800250a7221.socket failed (Invalid argument) > > [2015-02-14 06:29:19.862853] I [MSGID: 106006] > [glusterd-handler.c:4257:__glusterd_nodesvc_rpc_notify] 0-management: > nfs has disconnected from glusterd. > > [2015-02-14 06:29:19.953762] I > [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick > /exp/br02/brick2 on port 49153 > > [2015-02-14 06:31:12.977450] I > [glusterd-replace-brick.c:99:__glusterd_handle_replace_brick] > 0-management: Received replace brick req > > [2015-02-14 06:31:12.977495] I > [glusterd-replace-brick.c:154:__glusterd_handle_replace_brick] > 0-management: Received replace brick status request > > [2015-02-14 06:31:13.048852] I > [glusterd-replace-brick.c:1412:rb_update_srcbrick_port] 0-: adding > src-brick port no > > [2015-02-14 06:31:19.588380] I > [glusterd-replace-brick.c:99:__glusterd_handle_replace_brick] > 0-management: Received replace brick req > > [2015-02-14 06:31:19.588422] I > [glusterd-replace-brick.c:154:__glusterd_handle_replace_brick] > 0-management: Received replace brick status request > > [2015-02-14 06:31:19.661101] I > [glusterd-replace-brick.c:1412:rb_update_srcbrick_port] 0-: adding > src-brick port no > > [2015-02-14 06:31:45.115355] W > [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx > modification failed > > [2015-02-14 06:31:45.118597] I > [glusterd-handler.c:3803:__glusterd_handle_status_volume] > 0-management: Received status volume req for volume Storage1 > > [2015-02-14 06:32:10.956357] I > [glusterd-replace-brick.c:99:__glusterd_handle_replace_brick] > 0-management: Received replace brick req > > [2015-02-14 06:32:10.956385] I > [glusterd-replace-brick.c:154:__glusterd_handle_replace_brick] > 0-management: Received replace brick commit request > > [2015-02-14 06:32:11.028472] I > [glusterd-replace-brick.c:1412:rb_update_srcbrick_port] 0-: adding > src-brick port no > > [2015-02-14 06:32:12.122552] I > [glusterd-utils.c:6276:glusterd_nfs_pmap_deregister] 0-: De-registered > MOUNTV3 successfully > > [2015-02-14 06:32:12.131836] I > [glusterd-utils.c:6281:glusterd_nfs_pmap_deregister] 0-: De-registered > MOUNTV1 successfully > > [2015-02-14 06:32:12.141107] I > [glusterd-utils.c:6286:glusterd_nfs_pmap_deregister] 0-: De-registered > NFSV3 successfully > > [2015-02-14 06:32:12.150375] I > [glusterd-utils.c:6291:glusterd_nfs_pmap_deregister] 0-: De-registered > NLM v4 successfully > > [2015-02-14 06:32:12.159630] I > [glusterd-utils.c:6296:glusterd_nfs_pmap_deregister] 0-: De-registered > NLM v1 successfully > > [2015-02-14 06:32:12.168889] I > [glusterd-utils.c:6301:glusterd_nfs_pmap_deregister] 0-: De-registered > ACL v3 successfully > > [2015-02-14 06:32:13.254689] I > [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting > frame-timeout to 600 > > [2015-02-14 06:32:13.254799] W [socket.c:2992:socket_connect] > 0-management: Ignore failed connection attempt on , (No such file or > directory) > > [2015-02-14 06:32:13.257790] I > [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting > frame-timeout to 600 > > [2015-02-14 06:32:13.257908] W [socket.c:2992:socket_connect] > 0-management: Ignore failed connection attempt on , (No such file or > directory) > > [2015-02-14 06:32:13.258031] W [socket.c:611:__socket_rwv] > 0-socket.management: writev on 127.0.0.1:1019 failed (Broken pipe) > > [2015-02-14 06:32:13.258111] W [socket.c:611:__socket_rwv] > 0-socket.management: writev on 127.0.0.1:1021 failed (Broken pipe) > > [2015-02-14 06:32:13.258130] W [socket.c:611:__socket_rwv] > 0-socket.management: writev on 10.45.16.17:1018 failed (Broken pipe) > > [2015-02-14 06:32:13.711948] I [mem-pool.c:545:mem_pool_destroy] > 0-management: size=588 max=0 total=0 > > [2015-02-14 06:32:13.711967] I [mem-pool.c:545:mem_pool_destroy] > 0-management: size=124 max=0 total=0 > > [2015-02-14 06:32:13.712008] I [mem-pool.c:545:mem_pool_destroy] > 0-management: size=588 max=0 total=0 > > [2015-02-14 06:32:13.712021] I [mem-pool.c:545:mem_pool_destroy] > 0-management: size=124 max=0 total=0 > > [2015-02-14 06:32:13.731311] I [mem-pool.c:545:mem_pool_destroy] > 0-management: size=588 max=0 total=0 > > [2015-02-14 06:32:13.731326] I [mem-pool.c:545:mem_pool_destroy] > 0-management: size=124 max=0 total=0 > > [2015-02-14 06:32:13.731356] I > [glusterd-pmap.c:271:pmap_registry_remove] 0-pmap: removing brick > /exp/br02/brick2 on port 49153 > > [2015-02-14 06:32:13.823129] I [socket.c:2344:socket_event_handler] > 0-transport: disconnecting now > > [2015-02-14 06:32:13.840668] W [socket.c:611:__socket_rwv] > 0-management: readv on > /var/run/7565ec897c6454bd3e2f4800250a7221.socket failed (Invalid argument) > > [2015-02-14 06:32:13.840693] I [MSGID: 106006] > [glusterd-handler.c:4257:__glusterd_nodesvc_rpc_notify] 0-management: > nfs has disconnected from glusterd. > > [2015-02-14 06:32:13.840712] W [socket.c:611:__socket_rwv] > 0-management: readv on > /var/run/ac4c043d3c6a2e5159c86e8c75c51829.socket failed (Invalid argument) > > [2015-02-14 06:32:13.840728] I [MSGID: 106006] > [glusterd-handler.c:4257:__glusterd_nodesvc_rpc_notify] 0-management: > glustershd has disconnected from glusterd. > > [2015-02-14 06:32:14.729667] E > [glusterd-rpc-ops.c:1169:__glusterd_commit_op_cbk] 0-management: > Received commit RJT from uuid: 294aa603-ec24-44b9-864b-0fe743faa8d9 > > [2015-02-14 06:32:14.743623] E > [glusterd-rpc-ops.c:1169:__glusterd_commit_op_cbk] 0-management: > Received commit RJT from uuid: 92aabaf4-4b6c-48da-82b6-c465aff2ec6d > > [2015-02-14 06:32:18.762975] W > [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx > modification failed > > [2015-02-14 06:32:18.764552] I > [glusterd-handler.c:3803:__glusterd_handle_status_volume] > 0-management: Received status volume req for volume Storage1 > > [2015-02-14 06:32:18.769051] E > [glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] > 0-management: Local tasks count (0) and remote tasks count (1) do not > match. Not aggregating tasks status. > > [2015-02-14 06:32:18.769070] E > [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed > to aggregate response from node/brick > > [2015-02-14 06:32:18.771095] E > [glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] > 0-management: Local tasks count (0) and remote tasks count (1) do not > match. Not aggregating tasks status. > > [2015-02-14 06:32:18.771108] E > [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed > to aggregate response from node/brick > > [2015-02-14 06:32:48.570796] W > [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx > modification failed > > [2015-02-14 06:32:48.572352] I > [glusterd-handler.c:3803:__glusterd_handle_status_volume] > 0-management: Received status volume req for volume Storage1 > > [2015-02-14 06:32:48.576899] E > [glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] > 0-management: Local tasks count (0) and remote tasks count (1) do not > match. Not aggregating tasks status. > > [2015-02-14 06:32:48.576918] E > [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed > to aggregate response from node/brick > > [2015-02-14 06:32:48.578982] E > [glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] > 0-management: Local tasks count (0) and remote tasks count (1) do not > match. Not aggregating tasks status. > > [2015-02-14 06:32:48.579001] E > [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed > to aggregate response from node/brick > > [2015-02-14 06:36:57.840738] W > [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx > modification failed > > [2015-02-14 06:36:57.842370] I > [glusterd-handler.c:3803:__glusterd_handle_status_volume] > 0-management: Received status volume req for volume Storage1 > > [2015-02-14 06:36:57.846919] E > [glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] > 0-management: Local tasks count (0) and remote tasks count (1) do not > match. Not aggregating tasks status. > > [2015-02-14 06:36:57.846941] E > [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed > to aggregate response from node/brick > > [2015-02-14 06:36:57.849026] E > [glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] > 0-management: Local tasks count (0) and remote tasks count (1) do not > match. Not aggregating tasks status. > > [2015-02-14 06:36:57.849046] E > [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed > to aggregate response from node/brick > > [2015-02-14 06:37:20.208081] W > [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx > modification failed > > [2015-02-14 06:37:20.211279] I > [glusterd-handler.c:3803:__glusterd_handle_status_volume] > 0-management: Received status volume req for volume Storage1 > > [2015-02-14 06:37:20.215792] E > [glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] > 0-management: Local tasks count (0) and remote tasks count (1) do not > match. Not aggregating tasks status. > > [2015-02-14 06:37:20.215809] E > [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed > to aggregate response from node/brick > > [2015-02-14 06:37:20.216295] E > [glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] > 0-management: Local tasks count (0) and remote tasks count (1) do not > match. Not aggregating tasks status. > > [2015-02-14 06:37:20.216308] E > [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed > to aggregate response from node/brick > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150215/a1da4060/attachment.html>
Joe Julian
2015-Feb-15 18:54 UTC
[Gluster-users] Gluster replicate-brick issues (Distrubuted-Replica)
Of those missing files, are they maybe dht link files? Mode 1000, size 0. On February 14, 2015 12:58:12 AM PST, Thomas Holkenbrink <thomas.holkenbrink at fibercloud.com> wrote:>We have tried to migrate a Brick from one server to another using the >following commands. But the data is NOT being replicated... and the >BRICK is not showing up anymore. >Gluster still appears to be working but the Bricks are not balanced and >I need to add the other Brick for Server3 that I don't want to do until >after Server1:Brick2 gets replicated. > >This is the command to create the Original Volume: >[root at Server1 ~]# gluster volume create Storage1 replica 2 transport >tcp Server1:/exp/br01/brick1 Server2:/exp/br01/brick1 >Server1:/exp/br02/brick2 Server2:/exp/br02/brick2 > > >This is the Current configuration BEFORE the migration.. Server3 has >been Peer Probed successfully but that has been it >[root at Server1 ~]# gluster --version >glusterfs 3.6.2 built on Jan 22 2015 12:58:11 > >[root at Server1 ~]# gluster volume status >Status of volume: Storage1 >Gluster process Port Online Pid >------------------------------------------------------------------------------ >Brick Server1:/exp/br01/brick1 49152 Y 2167 >Brick Server2:/exp/br01/brick1 49152 Y 2192 >Brick Server1:/exp/br02/brick2 49153 Y 2172 <--- this is the >one that goes missing >Brick Server2:/exp/br02/brick2 49153 Y 2193 >NFS Server on localhost 2049 Y 2181 >Self-heal Daemon on localhost N/A Y 2186 >NFS Server on Server2 2049 Y 2205 >Self-heal Daemon on Server2 N/A Y 2210 >NFS Server on Server3 2049 Y 6015 >Self-heal Daemon on Server3 N/A Y 6016 > >Task Status of Volume Storage1 >------------------------------------------------------------------------------ >There are no active volume tasks >[root at Server1 ~]# gluster volume info > >Volume Name: Storage1 >Type: Distributed-Replicate >Volume ID: 9616ce42-48bd-4fe3-883f-decd6c4fcd00 >Status: Started >Number of Bricks: 2 x 2 = 4 >Transport-type: tcp >Bricks: >Brick1: Server1:/exp/br01/brick1 >Brick2: Server2:/exp/br01/brick1 >Brick3: Server1:/exp/br02/brick2 >Brick4: Server2:/exp/br02/brick2 >Options Reconfigured: >diagnostics.brick-log-level: WARNING >diagnostics.client-log-level: WARNING >cluster.entry-self-heal: off >cluster.data-self-heal: off >cluster.metadata-self-heal: off >performance.cache-size: 1024MB >performance.cache-max-file-size: 2MB >performance.cache-refresh-timeout: 1 >performance.stat-prefetch: off >performance.read-ahead: on >performance.quick-read: off >performance.write-behind-window-size: 4MB >performance.flush-behind: on >performance.write-behind: on >performance.io-thread-count: 32 >performance.io-cache: on >network.ping-timeout: 2 >nfs.addr-namelookup: off >performance.strict-write-ordering: on >[root at Server1 ~]# > > > >So we start the Migration of the Brick to the new server using the >replace Brick command >[root at Server1 ~]# volname=Storage1 > >[root at Server1 ~]# from=Server1:/exp/br02/brick2 > >[root at Server1 ~]# to=Server3:/exp/br02/brick2 > >[root at Server1 ~]# gluster volume replace-brick $volname $from $to start >All replace-brick commands except commit force are deprecated. Do you >want to continue? (y/n) y >volume replace-brick: success: replace-brick started successfully >ID: 0062d555-e7eb-4ebe-a264-7e0baf6e7546 > > >[root at Server1 ~]# gluster volume replace-brick $volname $from $to >status >All replace-brick commands except commit force are deprecated. Do you >want to continue? (y/n) y >volume replace-brick: success: Number of files migrated = 281 >Migration complete > >At this point everything seems to be in order with no outstanding >issues. > >[root at Server1 ~]# gluster volume status >Status of volume: Storage1 >Gluster process Port Online Pid >------------------------------------------------------------------------------ >Brick Server1:/exp/br01/brick1 49152 Y 2167 >Brick Server2:/exp/br01/brick1 49152 Y 2192 >Brick Server1:/exp/br02/brick2 49153 Y 27557 >Brick Server2:/exp/br02/brick2 49153 Y 2193 >NFS Server on localhost 2049 Y 27562 >Self-heal Daemon on localhost N/A Y 2186 >NFS Server on Server2 2049 Y 2205 >Self-heal Daemon on Server2 N/A Y 2210 >NFS Server on Server3 2049 Y 6015 >Self-heal Daemon on Server3 N/A Y 6016 > >Task Status of Volume Storage1 >------------------------------------------------------------------------------ >Task : Replace brick >ID : 0062d555-e7eb-4ebe-a264-7e0baf6e7546 >Source Brick : Server1:/exp/br02/brick2 >Destination Brick : Server3:/exp/br02/brick2 >Status : completed > >The volume reports that the replace Brick command completed.. so the >next step is to commit the change > >[root at Server1 ~]# gluster volume replace-brick $volname $from $to >commit >All replace-brick commands except commit force are deprecated. Do you >want to continue? (y/n) y >volume replace-brick: success: replace-brick commit successful > >At this point when I take a look at the status I see that the OLD brick >is now missing (Server1:/exp/br02/brick2) AND I don't see the new >Brick... WTF... panic! > >[root at Server1 ~]# gluster volume status >Status of volume: Storage1 >Gluster process Port Online >Pid >------------------------------------------------------------------------------ >Brick Server1:/exp/br01/brick1 49152 Y 2167 >Brick Server2:/exp/br01/brick1 49152 Y 2192 >Brick Server2:/exp/br02/brick2 49153 Y 2193 >NFS Server on localhost 2049 Y 28906 >Self-heal Daemon on localhost N/A Y 28911 >NFS Server on Server2 2049 Y 2205 >Self-heal Daemon on Server2 N/A Y 2210 >NFS Server on Server3 2049 Y 6015 >Self-heal Daemon on Server3 N/A Y 6016 > >Task Status of Volume Storage1 >------------------------------------------------------------------------------ >There are no active volume tasks > > >After the commit on Server1 it does not have the Tasks listed >anymore... yet server2 and server3 see this > >[root at Server2 ~]# gluster volume status >Status of volume: Storage1 >Gluster process Port Online Pid >------------------------------------------------------------------------------ >Brick Server1:/exp/br01/brick1 49152 Y 2167 >Brick Server2:/exp/br01/brick1 49152 Y 2192 >Brick Server2:/exp/br02/brick2 49153 Y 2193 >NFS Server on localhost 2049 Y 2205 >Self-heal Daemon on localhost N/A Y 2210 >NFS Server on 10.45.16.17 2049 Y 28906 >Self-heal Daemon on 10.45.16.17 N/A Y 28911 >NFS Server on server3 2049 Y 6015 >Self-heal Daemon on server3 N/A Y 6016 > >Task Status of Volume Storage1 >------------------------------------------------------------------------------ >Task : Replace brick >ID : 0062d555-e7eb-4ebe-a264-7e0baf6e7546 >Source Brick : Server1:/exp/br02/brick2 >Destination Brick : server3:/exp/br02/brick2 >Status : completed > > >If I navigate the brick on Server3 the brick is NOT empty.. but missing >A LOT! It's like the replace brick stopped... and never restarted >again. >The replace brick reported back "Number of files migrated = 281 >Migration complete" but when I look on Server3 Brick I get: > [root at Server3 brick2]# find . -type f -print | wc -l >16 > >I'm missing 265 files.. (they still exist on the OLD brick.. but how >can I move it?) > >If I try to add the old brick back with another brick on the new server >as such >[root at Server1 ~]# gluster volume add-brick Storage1 >Server1:/exp/br02/brick2 Server3:/exp/br01/brick1 >volume add-brick: failed: /exp/br02/brick2 is already part of a volume > >Im fearfull of running: >[root at Server1 ~]# setfattr -n trusted.glusterfs.volume-id -v 0x$(grep >volume-id /var/lib/glusterd/vols/$volname/info | cut -d= -f2 | sed >'s/-//g') /exp/br02/brick2 >Although it should allow me to add the brick > >Gluster Heal info returns: >[root at Server2 ~]# gluster volume heal Storage1 info >Brick Server1:/exp/br01/brick1/ >Number of entries: 0 > >Brick Server2:/exp/br01/brick1/ >Number of entries: 0 > >Brick Server1:/exp/br02/brick2 >Status: Transport endpoint is not connected > >Brick Server2:/exp/br02/brick2/ >Number of entries: 0 > >I have restarted glusterd numerous times. > > >at this time I'm not sure where to go from here... I know that the >Server1:/exp/br02/brick2 still has all the data.. and >Server3:/exp/br01/brick1 is not complete > >How do I actually get the brick to replicate? >How can I add Server1:/exp/br02/brick2 back into the trusted pool if I >can't replicate it, or re-add it? >How can I fix this to get it back into a replicated state between the >three servers? > >Thomas > > > > >----DATA---- > >Gluster volume info at this point >[root at Server1 ~]# gluster volume info > >Volume Name: Storage1 >Type: Distributed-Replicate >Volume ID: 9616ce42-48bd-4fe3-883f-decd6c4fcd00 >Status: Started >Number of Bricks: 2 x 2 = 4 >Transport-type: tcp >Bricks: >Brick1: Server1:/exp/br01/brick1 >Brick2: Server2:/exp/br01/brick1 >Brick3: server3:/exp/br02/brick2 >Brick4: Server2:/exp/br02/brick2 >Options Reconfigured: >diagnostics.brick-log-level: WARNING >diagnostics.client-log-level: WARNING >cluster.entry-self-heal: off >cluster.data-self-heal: off >cluster.metadata-self-heal: off >performance.cache-size: 1024MB >performance.cache-max-file-size: 2MB >performance.cache-refresh-timeout: 1 >performance.stat-prefetch: off >performance.read-ahead: on >performance.quick-read: off >performance.write-behind-window-size: 4MB >performance.flush-behind: on >performance.write-behind: on >performance.io-thread-count: 32 >performance.io-cache: on >network.ping-timeout: 2 >nfs.addr-namelookup: off >performance.strict-write-ordering: on >[root at Server1 ~]# > >[root at server3 brick2]# gluster volume heal Storage1 info >Brick Server1:/exp/br01/brick1/ >Number of entries: 0 > >Brick Server2:/exp/br01/brick1/ >Number of entries: 0 > >Brick Server3:/exp/br02/brick2/ >Number of entries: 0 > >Brick Server2:/exp/br02/brick2/ >Number of entries: 0 > > >Gluster LOG ( there are a few errors but I'm not sure how to decipher >them) > >[2015-02-14 06:29:19.862809] I [MSGID: 106005] >[glusterd-handler.c:4142:__glusterd_brick_rpc_notify] 0-management: >Brick Server1:/exp/br02/brick2 has disconnected from glusterd. >[2015-02-14 06:29:19.862836] W [socket.c:611:__socket_rwv] >0-management: readv on /var/run/7565ec897c6454bd3e2f4800250a7221.socket >failed (Invalid argument) >[2015-02-14 06:29:19.862853] I [MSGID: 106006] >[glusterd-handler.c:4257:__glusterd_nodesvc_rpc_notify] 0-management: >nfs has disconnected from glusterd. >[2015-02-14 06:29:19.953762] I [glusterd-pmap.c:227:pmap_registry_bind] >0-pmap: adding brick /exp/br02/brick2 on port 49153 >[2015-02-14 06:31:12.977450] I >[glusterd-replace-brick.c:99:__glusterd_handle_replace_brick] >0-management: Received replace brick req >[2015-02-14 06:31:12.977495] I >[glusterd-replace-brick.c:154:__glusterd_handle_replace_brick] >0-management: Received replace brick status request >[2015-02-14 06:31:13.048852] I >[glusterd-replace-brick.c:1412:rb_update_srcbrick_port] 0-: adding >src-brick port no >[2015-02-14 06:31:19.588380] I >[glusterd-replace-brick.c:99:__glusterd_handle_replace_brick] >0-management: Received replace brick req >[2015-02-14 06:31:19.588422] I >[glusterd-replace-brick.c:154:__glusterd_handle_replace_brick] >0-management: Received replace brick status request >[2015-02-14 06:31:19.661101] I >[glusterd-replace-brick.c:1412:rb_update_srcbrick_port] 0-: adding >src-brick port no >[2015-02-14 06:31:45.115355] W >[glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx >modification failed >[2015-02-14 06:31:45.118597] I >[glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management: >Received status volume req for volume Storage1 >[2015-02-14 06:32:10.956357] I >[glusterd-replace-brick.c:99:__glusterd_handle_replace_brick] >0-management: Received replace brick req >[2015-02-14 06:32:10.956385] I >[glusterd-replace-brick.c:154:__glusterd_handle_replace_brick] >0-management: Received replace brick commit request >[2015-02-14 06:32:11.028472] I >[glusterd-replace-brick.c:1412:rb_update_srcbrick_port] 0-: adding >src-brick port no >[2015-02-14 06:32:12.122552] I >[glusterd-utils.c:6276:glusterd_nfs_pmap_deregister] 0-: De-registered >MOUNTV3 successfully >[2015-02-14 06:32:12.131836] I >[glusterd-utils.c:6281:glusterd_nfs_pmap_deregister] 0-: De-registered >MOUNTV1 successfully >[2015-02-14 06:32:12.141107] I >[glusterd-utils.c:6286:glusterd_nfs_pmap_deregister] 0-: De-registered >NFSV3 successfully >[2015-02-14 06:32:12.150375] I >[glusterd-utils.c:6291:glusterd_nfs_pmap_deregister] 0-: De-registered >NLM v4 successfully >[2015-02-14 06:32:12.159630] I >[glusterd-utils.c:6296:glusterd_nfs_pmap_deregister] 0-: De-registered >NLM v1 successfully >[2015-02-14 06:32:12.168889] I >[glusterd-utils.c:6301:glusterd_nfs_pmap_deregister] 0-: De-registered >ACL v3 successfully >[2015-02-14 06:32:13.254689] I >[rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting >frame-timeout to 600 >[2015-02-14 06:32:13.254799] W [socket.c:2992:socket_connect] >0-management: Ignore failed connection attempt on , (No such file or >directory) >[2015-02-14 06:32:13.257790] I >[rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting >frame-timeout to 600 >[2015-02-14 06:32:13.257908] W [socket.c:2992:socket_connect] >0-management: Ignore failed connection attempt on , (No such file or >directory) >[2015-02-14 06:32:13.258031] W [socket.c:611:__socket_rwv] >0-socket.management: writev on 127.0.0.1:1019 failed (Broken pipe) >[2015-02-14 06:32:13.258111] W [socket.c:611:__socket_rwv] >0-socket.management: writev on 127.0.0.1:1021 failed (Broken pipe) >[2015-02-14 06:32:13.258130] W [socket.c:611:__socket_rwv] >0-socket.management: writev on 10.45.16.17:1018 failed (Broken pipe) >[2015-02-14 06:32:13.711948] I [mem-pool.c:545:mem_pool_destroy] >0-management: size=588 max=0 total=0 >[2015-02-14 06:32:13.711967] I [mem-pool.c:545:mem_pool_destroy] >0-management: size=124 max=0 total=0 >[2015-02-14 06:32:13.712008] I [mem-pool.c:545:mem_pool_destroy] >0-management: size=588 max=0 total=0 >[2015-02-14 06:32:13.712021] I [mem-pool.c:545:mem_pool_destroy] >0-management: size=124 max=0 total=0 >[2015-02-14 06:32:13.731311] I [mem-pool.c:545:mem_pool_destroy] >0-management: size=588 max=0 total=0 >[2015-02-14 06:32:13.731326] I [mem-pool.c:545:mem_pool_destroy] >0-management: size=124 max=0 total=0 >[2015-02-14 06:32:13.731356] I >[glusterd-pmap.c:271:pmap_registry_remove] 0-pmap: removing brick >/exp/br02/brick2 on port 49153 >[2015-02-14 06:32:13.823129] I [socket.c:2344:socket_event_handler] >0-transport: disconnecting now >[2015-02-14 06:32:13.840668] W [socket.c:611:__socket_rwv] >0-management: readv on /var/run/7565ec897c6454bd3e2f4800250a7221.socket >failed (Invalid argument) >[2015-02-14 06:32:13.840693] I [MSGID: 106006] >[glusterd-handler.c:4257:__glusterd_nodesvc_rpc_notify] 0-management: >nfs has disconnected from glusterd. >[2015-02-14 06:32:13.840712] W [socket.c:611:__socket_rwv] >0-management: readv on /var/run/ac4c043d3c6a2e5159c86e8c75c51829.socket >failed (Invalid argument) >[2015-02-14 06:32:13.840728] I [MSGID: 106006] >[glusterd-handler.c:4257:__glusterd_nodesvc_rpc_notify] 0-management: >glustershd has disconnected from glusterd. >[2015-02-14 06:32:14.729667] E >[glusterd-rpc-ops.c:1169:__glusterd_commit_op_cbk] 0-management: >Received commit RJT from uuid: 294aa603-ec24-44b9-864b-0fe743faa8d9 >[2015-02-14 06:32:14.743623] E >[glusterd-rpc-ops.c:1169:__glusterd_commit_op_cbk] 0-management: >Received commit RJT from uuid: 92aabaf4-4b6c-48da-82b6-c465aff2ec6d >[2015-02-14 06:32:18.762975] W >[glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx >modification failed >[2015-02-14 06:32:18.764552] I >[glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management: >Received status volume req for volume Storage1 >[2015-02-14 06:32:18.769051] E >[glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] >0-management: Local tasks count (0) and remote tasks count (1) do not >match. Not aggregating tasks status. >[2015-02-14 06:32:18.769070] E >[glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed >to aggregate response from node/brick >[2015-02-14 06:32:18.771095] E >[glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] >0-management: Local tasks count (0) and remote tasks count (1) do not >match. Not aggregating tasks status. >[2015-02-14 06:32:18.771108] E >[glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed >to aggregate response from node/brick >[2015-02-14 06:32:48.570796] W >[glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx >modification failed >[2015-02-14 06:32:48.572352] I >[glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management: >Received status volume req for volume Storage1 >[2015-02-14 06:32:48.576899] E >[glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] >0-management: Local tasks count (0) and remote tasks count (1) do not >match. Not aggregating tasks status. >[2015-02-14 06:32:48.576918] E >[glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed >to aggregate response from node/brick >[2015-02-14 06:32:48.578982] E >[glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] >0-management: Local tasks count (0) and remote tasks count (1) do not >match. Not aggregating tasks status. >[2015-02-14 06:32:48.579001] E >[glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed >to aggregate response from node/brick >[2015-02-14 06:36:57.840738] W >[glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx >modification failed >[2015-02-14 06:36:57.842370] I >[glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management: >Received status volume req for volume Storage1 >[2015-02-14 06:36:57.846919] E >[glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] >0-management: Local tasks count (0) and remote tasks count (1) do not >match. Not aggregating tasks status. >[2015-02-14 06:36:57.846941] E >[glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed >to aggregate response from node/brick >[2015-02-14 06:36:57.849026] E >[glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] >0-management: Local tasks count (0) and remote tasks count (1) do not >match. Not aggregating tasks status. >[2015-02-14 06:36:57.849046] E >[glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed >to aggregate response from node/brick >[2015-02-14 06:37:20.208081] W >[glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx >modification failed >[2015-02-14 06:37:20.211279] I >[glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management: >Received status volume req for volume Storage1 >[2015-02-14 06:37:20.215792] E >[glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] >0-management: Local tasks count (0) and remote tasks count (1) do not >match. Not aggregating tasks status. >[2015-02-14 06:37:20.215809] E >[glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed >to aggregate response from node/brick >[2015-02-14 06:37:20.216295] E >[glusterd-utils.c:9955:glusterd_volume_status_aggregate_tasks_status] >0-management: Local tasks count (0) and remote tasks count (1) do not >match. Not aggregating tasks status. >[2015-02-14 06:37:20.216308] E >[glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed >to aggregate response from node/brick > > >------------------------------------------------------------------------ > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://www.gluster.org/mailman/listinfo/gluster-users-- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150215/828ac834/attachment.html>