Mario Splivalo
2016-May-01 12:03 UTC
[Gluster-users] Replacing failed node (2node replication)
Hello. I have set up glusterfs on two nodes, replicated volume across two bricks (each on one server): root at glu-tru:~# gluster volume info Volume Name: gv01 Type: Replicate Volume ID: 28dcf1d4-8a6c-4b70-a075-9e1dc4215271 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: glu-tru:/srv/gfs-bucket Brick2: glu-pre:/srv/gfs-bucket root at glu-tru:~# Now, I lost one server (glu-pre), and replaced it with a fresh one. The hostname and the IP address of the old server are the same as on the new server. This is how I re-added fresh server: glu-tru# gluster volume remove-brick gv01 replica 1 glu-pre:/srv/gfs-bucket glu-tru# gluster peer detach glu-pre Then I installed glusterfs on glu-pre (freshly installed box), created the brick directory, and then: glu-tru# gluster peer probe glu-pre glu-pre# gluster peer probe glu-tru glu-pre# gluster volume add-brick gv01 replica 2 glu-pre:/srv/gfs-bucket force glu-tru# gluster volume heal gv01 full After the last command the /srv/gfs-bucket started to get populated on the newly added server (glu-pre). Now, is this proper procedure to replace failed server? I've read the documentation (http://tinyurl.com/j3pnjza, and for instance: https://support.rackspace.com/how-to/recover-from-a-failed-server-in-a-glusterfs-array/ - but those mention that the volume will be inaccessible during healing period. Also, it seems more complicated, changing UUIDs in config files and so on. With the commands I pasted above I had perfectly fine running volume which was accessible all the time during the re-adding of the new server, and also during the healing period (I'm using this for a HA-setup for a django application, which writes a lot of custom files while working - while the volume was being healied I made sure that all the webapp-traffic is hitting only glu-tru node, the one which haven't crashed). I'd appreciate some comments from the more experienced glusterfs users. Mario P.S. This is glusterfs 3.4 on Ubuntu 14.04. -- Mario Splivalo mario at splivalo.hr "I can do it quick, I can do it cheap, I can do it well. Pick any two."
Kevin Lemonnier
2016-May-01 18:43 UTC
[Gluster-users] Replacing failed node (2node replication)
> > With the commands I pasted above I had perfectly fine running volume > which was accessible all the time during the re-adding of the new > server, and also during the healing period (I'm using this for a > HA-setup for a django application, which writes a lot of custom files > while working - while the volume was being healied I made sure that all > the webapp-traffic is hitting only glu-tru node, the one which haven't > crashed). >The volume stays accessible, but the files being healed are locked. That's probably why your app stayed online, web apps are usually a huge number of small-ish files, so locking them during a heal is pretty much invisible (healing a 2 Kb file being almost instant). If you had huge files on this, without sharding, it would have beem different :) -- Kevin Lemonnier PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: Digital signature URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160501/f2ed2eb7/attachment.sig>