thr3ads.net - Gluster users - [Gluster-users] How to remove a dead node and re-balance volume? [Sep 2013]

If this information is useful, please help other people find it:
Share via:

Anup Nair

2013-Sep-03 07:48 UTC

[Gluster-users] How to remove a dead node and re-balance volume?

Glusterfs version 3.2.2

I have a Gluster volume in which one our of the 4 peers/nodes had crashed
some time ago, prior to my joining service here.

I see from volume info that the crashed (non-existing) node is still listed
as one of the peers and the bricks are also listed. I would like to detach
this node and its bricks and rebalance the volume with remaining 3 peers.
But I am unable to do so. Here are my setps:

1. #gluster peer status
  Number of Peers: 3 -- (note: excluding the one I run this command from)

  Hostname: dbstore4r294 --- (note: node/peer that is down)
  Uuid: 8bf13458-1222-452c-81d3-565a563d768a
  State: Peer in Cluster (Disconnected)

  Hostname: 172.16.1.90
  Uuid: 77ebd7e4-7960-4442-a4a4-00c5b99a61b4
  State: Peer in Cluster (Connected)

  Hostname: dbstore3r294
  Uuid: 23d7a18c-fe57-47a0-afbc-1e1a5305c0eb
  State: Peer in Cluster (Connected)

2. #gluster peer detach dbstore4r294
  Brick(s) with the peer dbstore4r294 exist in cluster

3. #gluster volume info

  Volume Name: test-volume
  Type: Distributed-Replicate
  Status: Started
  Number of Bricks: 4 x 2 = 8
  Transport-type: tcp
  Bricks:
  Brick1: dbstore1r293:/datastore1
  Brick2: dbstore2r293:/datastore1
  Brick3: dbstore3r294:/datastore1
  Brick4: dbstore4r294:/datastore1
  Brick5: dbstore1r293:/datastore2
  Brick6: dbstore2r293:/datastore2
  Brick7: dbstore3r294:/datastore2
  Brick8: dbstore4r294:/datastore2
  Options Reconfigured:
  network.ping-timeout: 42s
  performance.cache-size: 64MB
  performance.write-behind-window-size: 3MB
  performance.io-thread-count: 8
  performance.cache-refresh-timeout: 2

Note that the non-existent node/peer is  -- dbstore4r294 (bricks are
:/datastore1 & /datastore2  - i.e.  brick4 and brick8)

4. #gluster volume remove-brick test-volume dbstore4r294:/datastore1
  Removing brick(s) can result in data loss. Do you want to Continue? (y/n)
y
  Remove brick incorrect brick count of 1 for replica 2

5. #gluster volume remove-brick test-volume dbstore4r294:/datastore1
dbstore4r294:/datastore2
  Removing brick(s) can result in data loss. Do you want to Continue? (y/n)
y
  Bricks not from same subvol for replica

How do I remove the peer? What are the steps considering that the node is
non-existent?
*
Regards,
*

Anup Nair
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130903/4649bab6/attachment.html>

Vijay Bellur

2013-Sep-04 19:11 UTC

head link

[Gluster-users] How to remove a dead node and re-balance volume?

On 09/03/2013 01:18 PM, Anup Nair wrote:> Glusterfs version 3.2.2
>
> I have a Gluster volume in which one our of the 4 peers/nodes had
> crashed some time ago, prior to my joining service here.
>
> I see from volume info that the crashed (non-existing) node is still
> listed as one of the peers and the bricks are also listed. I would like
> to detach this node and its bricks and rebalance the volume with
> remaining 3 peers. But I am unable to do so. Here are my setps:
>
> 1. #gluster peer status
>    Number of Peers: 3 -- (note: excluding the one I run this command from)
>
>    Hostname: dbstore4r294 --- (note: node/peer that is down)
>    Uuid: 8bf13458-1222-452c-81d3-565a563d768a
>    State: Peer in Cluster (Disconnected)
>
>    Hostname: 172.16.1.90
>    Uuid: 77ebd7e4-7960-4442-a4a4-00c5b99a61b4
>    State: Peer in Cluster (Connected)
>
>    Hostname: dbstore3r294
>    Uuid: 23d7a18c-fe57-47a0-afbc-1e1a5305c0eb
>    State: Peer in Cluster (Connected)
>
> 2. #gluster peer detach dbstore4r294
>    Brick(s) with the peer dbstore4r294 exist in cluster
>
> 3. #gluster volume info
>
>    Volume Name: test-volume
>    Type: Distributed-Replicate
>    Status: Started
>    Number of Bricks: 4 x 2 = 8
>    Transport-type: tcp
>    Bricks:
>    Brick1: dbstore1r293:/datastore1
>    Brick2: dbstore2r293:/datastore1
>    Brick3: dbstore3r294:/datastore1
>    Brick4: dbstore4r294:/datastore1
>    Brick5: dbstore1r293:/datastore2
>    Brick6: dbstore2r293:/datastore2
>    Brick7: dbstore3r294:/datastore2
>    Brick8: dbstore4r294:/datastore2
>    Options Reconfigured:
>    network.ping-timeout: 42s
>    performance.cache-size: 64MB
>    performance.write-behind-window-size: 3MB
>    performance.io-thread-count: 8
>    performance.cache-refresh-timeout: 2
>
> Note that the non-existent node/peer is  -- dbstore4r294 (bricks are
> :/datastore1 & /datastore2  - i.e.  brick4 and brick8)
>
> 4. #gluster volume remove-brick test-volume dbstore4r294:/datastore1
>    Removing brick(s) can result in data loss. Do you want to Continue?
> (y/n) y
>    Remove brick incorrect brick count of 1 for replica 2
>
> 5. #gluster volume remove-brick test-volume dbstore4r294:/datastore1
> dbstore4r294:/datastore2
>    Removing brick(s) can result in data loss. Do you want to Continue?
> (y/n) y
>    Bricks not from same subvol for replica
>
> How do I remove the peer? What are the steps considering that the node
> is non-existent?
> */

Do you plan to replace the dead server with a new server? If so, this 
could be a possible sequence of steps:

1. Peer probe new server and have two bricks commited

2. volume replace-brick <volname> <brick4> <new-brick1> commit
force

3. volume replace-brick <volname> <brick8> <new-brick2> commit
force

4. peer detach dead server.

5. Since 3.2.2 is being used here, you would need a crawl (find . | 
xargs stat) to trigger self-healing for the newly added bricks.

-Vijay

Gluster users - Sep 2013 - How to remove a dead node and re-balance volume?

[Gluster-users] How to remove a dead node and re-balance volume?

[Gluster-users] How to remove a dead node and re-balance volume?