thr3ads.net - Gluster users - [Gluster-users] Replacing failed node (2node replication) [May 2016]

If this information is useful, please help other people find it:
Share via:

Mario Splivalo

2016-May-01 12:03 UTC

[Gluster-users] Replacing failed node (2node replication)

Hello.

I have set up glusterfs on two nodes, replicated volume across two
bricks (each on one server):

root at glu-tru:~# gluster volume info

Volume Name: gv01
Type: Replicate
Volume ID: 28dcf1d4-8a6c-4b70-a075-9e1dc4215271
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: glu-tru:/srv/gfs-bucket
Brick2: glu-pre:/srv/gfs-bucket
root at glu-tru:~#


Now, I lost one server (glu-pre), and replaced it with a fresh one. The
hostname and the IP address of the old server are the same as on the new
server. This is how I re-added fresh server:

glu-tru# gluster volume remove-brick gv01 replica 1 glu-pre:/srv/gfs-bucket
glu-tru# gluster peer detach glu-pre

Then I installed glusterfs on glu-pre (freshly installed box), created
the brick directory, and then:

glu-tru# gluster peer probe glu-pre

glu-pre# gluster peer probe glu-tru

glu-pre# gluster volume add-brick gv01 replica 2 glu-pre:/srv/gfs-bucket
force

glu-tru# gluster volume heal gv01 full

After the last command the /srv/gfs-bucket started to get populated on
the newly added server (glu-pre).

Now, is this proper procedure to replace failed server? I've read the
documentation (http://tinyurl.com/j3pnjza, and for instance:
https://support.rackspace.com/how-to/recover-from-a-failed-server-in-a-glusterfs-array/
- but those mention that the volume will be inaccessible during healing
period. Also, it seems more complicated, changing UUIDs in config files
and so on.

With the commands I pasted above I had perfectly fine running volume
which was accessible all the time during the re-adding of the new
server, and also during the healing period (I'm using this for a
HA-setup for a django application, which writes a lot of custom files
while working - while the volume was being healied I made sure that all
the webapp-traffic is hitting only glu-tru node, the one which haven't
crashed).

I'd appreciate some comments from the more experienced glusterfs users.

	Mario

P.S. This is glusterfs 3.4 on Ubuntu 14.04.
-- 
Mario Splivalo
mario at splivalo.hr

"I can do it quick, I can do it cheap, I can do it well. Pick any
two."

Kevin Lemonnier

2016-May-01 18:43 UTC

head link

[Gluster-users] Replacing failed node (2node replication)

> 
> With the commands I pasted above I had perfectly fine running volume
> which was accessible all the time during the re-adding of the new
> server, and also during the healing period (I'm using this for a
> HA-setup for a django application, which writes a lot of custom files
> while working - while the volume was being healied I made sure that all
> the webapp-traffic is hitting only glu-tru node, the one which haven't
> crashed).
> 
The volume stays accessible, but the files being healed are locked.
That's probably why your app stayed online, web apps are usually a huge
number
of small-ish files, so locking them during a heal is pretty much invisible
(healing a
2 Kb file being almost instant).
If you had huge files on this, without sharding, it would have beem different :)

-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160501/f2ed2eb7/attachment.sig>

Gluster users - May 2016 - Replacing failed node (2node replication)

[Gluster-users] Replacing failed node (2node replication)

[Gluster-users] Replacing failed node (2node replication)