thr3ads.net - Gluster users - [Gluster-users] Failed file system [Aug 2016]

If this information is useful, please help other people find it:
Share via:

Andres E. Moya

2016-Aug-03 19:39 UTC

[Gluster-users] Failed file system

Does anyone else have input? 

we are currently only running off 1 node and one node is offline in replicate
brick.

we are not experiencing any downtime because the 1 node is up. 

I do not understand which is the best way to bring up a second node. 

Do we just re create a file system on the node that is down and the mount points
and allow gluster to heal( my concern with this is whether the node that is down
will some how take precedence and wipe out the data on the healthy node instead
of vice versa)

Or do we fully wipe out the config on the node that is down, re create the file
system and re add the node that is down into gluster using the add brick command
replica 3, and then wait for it to heal then run the remove brick command for
the failed brick

which would be the safest and easiest to accomplish 

thanks for any input 




From: "Leno Vo" <lenovolastname at yahoo.com> 
To: "Andres E. Moya" <amoya at moyasolutions.com> 
Cc: "gluster-users" <gluster-users at gluster.org> 
Sent: Tuesday, August 2, 2016 6:45:27 PM 
Subject: Re: [Gluster-users] Failed file system 

if you don't want any downtime (in the case that your node 2 really die),
you have to create a new gluster san (if you have the resources of course, 3
nodes as much as possible this time), and then just migrate your vms (or files),
therefore no downtime but you have to cross your finger that the only node will
not die too... also without sharding the vm migration especially an rdp one,
will be slow access from users till it migrated.

you have to start testing sharding, it's fast and cool... 




On Tuesday, August 2, 2016 2:51 PM, Andres E. Moya <amoya at
moyasolutions.com> wrote:


couldnt we just add a new server by 

gluster peer probe 
gluster volume add-brick replica 3 (will this command succeed with 1 current
failed brick?)

let it heal, then 

gluster volume remove remove-brick 

From: "Leno Vo" <lenovolastname at yahoo.com> 
To: "Andres E. Moya" <amoya at moyasolutions.com>,
"gluster-users" <gluster-users at gluster.org>
Sent: Tuesday, August 2, 2016 1:26:42 PM 
Subject: Re: [Gluster-users] Failed file system 

you need to have a downtime to recreate the second node, two nodes is actually
not good for production and you should have put raid 1 or raid 5 as your gluster
storage, when you recreate the second node you might try running some VMs that
need to be up and rest of vm need to be down but stop all backup and if you have
replication, stop it too. if you have 1G nic, 2cpu and less 8Gram, then i
suggest all turn off the VMs during recreation of second node. someone said if
you have sharding with 3.7.x, maybe some vip vm can be up...

if it just a filesystem, then just turn off the backup service until you
recreate the second node. depending on your resources and how big is your
storage, it might be hours to recreate it and even days...

here's my process on recreating the second or third node (copied and modifed
from the net),

#make sure partition is already added!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 
This procedure is for replacing a failed server, IF your newly installed server
has the same hostname as the failed one:

(If your new server will have a different hostname, see this article instead.) 

For purposes of this example, the server that crashed will be server3 and the
other servers will be server1 and server2

On both server1 and server2, make sure hostname server3 resolves to the correct
IP address of the new replacement server.
#On either server1 or server2, do 
grep server3 /var/lib/glusterd/peers/* 

This will return a uuid followed by ":hostname1=server3" 

#On server3, make sure glusterd is stopped, then do 
echo UUID={uuid from previous step}>/var/lib/glusterd/glusterd.info 

#actual testing below, 
[root at node1 ~]# cat /var/lib/glusterd/glusterd.info 
UUID=4b9d153c-5958-4dbe-8f91-7b5002882aac 
operating-version=30710 
#the second line is new......... maybe not needed... 

On server3: 
make sure that all brick directories are created/mounted 
start glusterd 
peer probe one of the existing servers 

#restart glusterd, check that full peer list has been populated using 
gluster peer status 

(if peers are missing, probe them explicitly, then restart glusterd again) 
#check that full volume configuration has been populated using 
gluster volume info 

if volume configuration is missing, do 
#on the other node 
gluster volume sync "replace-node" all 

#on the node to be replaced 
setfattr -n trusted.glusterfs.volume-id -v 0x$(grep volume-id
/var/lib/glusterd/vols/v1/info | cut -d= -f2 | sed 's/-//g') /gfs/b1/v1
setfattr -n trusted.glusterfs.volume-id -v 0x$(grep volume-id
/var/lib/glusterd/vols/v2/info | cut -d= -f2 | sed 's/-//g') /gfs/b2/v2
setfattr -n trusted.glusterfs.volume-id -v 0x$(grep volume-id
/var/lib/glusterd/vols/config/info | cut -d= -f2 | sed 's/-//g')
/gfs/b1/config/c1

mount -t glusterfs localhost:config /data/data1 

#install ctdb if not yet installed and put it back online, use the step on
creating the ctdb config but
#use your common sense not to deleted or modify current one. 

gluster vol heal v1 full 
gluster vol heal v2 full 
gluster vol heal config full 



On Tuesday, August 2, 2016 11:57 AM, Andres E. Moya <amoya at
moyasolutions.com> wrote:


Hi, we have a 2 node replica setup 
on 1 of the nodes the file system that had the brick on it failed, not the OS 
can we re create a file system and mount the bricks on the same mount point 

what will happen, will the data from the other node sync over, or will the
failed node wipe out the data on the other mode?

what would be the correct process? 

Thanks in advance for any help 
_______________________________________________ 
Gluster-users mailing list 
[ mailto:Gluster-users at gluster.org | Gluster-users at gluster.org ] 
[ http://www.gluster.org/mailman/listinfo/gluster-users |
http://www.gluster.org/mailman/listinfo/gluster-users ]






-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160803/a285853a/attachment.html>

Mahdi Adnan

2016-Aug-03 19:55 UTC

head link

[Gluster-users] Failed file system

Hi,
I'm not expert in Gluster but, i think it would be better to replace the
downed brick with a new one.Maybe start from here;
https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-brick


-- 



Respectfully

    Mahdi A. Mahdi



Date: Wed, 3 Aug 2016 15:39:35 -0400
From: amoya at moyasolutions.com
To: gluster-users at gluster.org
Subject: Re: [Gluster-users] Failed file system

Does anyone else have input?we are currently only running off 1 node and one
node is offline in replicate brick.we are not experiencing any downtime because
the 1 node is up.I do not understand which is the best way to bring up a second
node.Do we just re create a file system on the node that is down and the mount
points and allow gluster to heal( my concern with this is whether the node that
is down will some how take precedence and wipe out the data on the healthy node
instead of vice versa)Or do we fully wipe out the config on the node that is
down, re create the file system and re add the node that is down into gluster
using the add brick command replica 3, and then wait for it to heal then run the
remove brick command for the failed brickwhich would be the safest and easiest
to accomplishthanks for any input


From: "Leno Vo" <lenovolastname at yahoo.com>
To: "Andres E. Moya" <amoya at moyasolutions.com>
Cc: "gluster-users" <gluster-users at gluster.org>
Sent: Tuesday, August 2, 2016 6:45:27 PM
Subject: Re: [Gluster-users] Failed file system

if you don't want any downtime (in the case that your node 2 really die),
you have to create a new gluster san (if you have the resources of course, 3
nodes as much as possible this time), and then just migrate your vms (or files),
therefore no downtime but you have to cross your finger that the only node will
not die too...  also without sharding the vm migration especially an rdp one,
will be slow access from users till it migrated.
you have to start testing sharding, it's fast and cool...

 

    On Tuesday, August 2, 2016 2:51 PM, Andres E. Moya <amoya at
moyasolutions.com> wrote:
  

 couldnt we just add a new server bygluster peer probegluster volume add-brick
replica 3 (will this command succeed with 1 current failed brick?)let it heal,
then gluster volume remove remove-brickFrom: "Leno Vo"
<lenovolastname at yahoo.com>To: "Andres E. Moya" <amoya at
moyasolutions.com>, "gluster-users" <gluster-users at
gluster.org>Sent: Tuesday, August 2, 2016 1:26:42 PMSubject: Re:
[Gluster-users] Failed file systemyou need to have a downtime to recreate the
second node, two nodes is actually not good for production and you should have
put raid 1 or raid 5 as your gluster storage, when you recreate the second node
you might try running some VMs that need to be up and rest of vm need to be down
but stop all backup and if you have replication, stop it too.  if you have 1G
nic, 2cpu and less 8Gram, then i suggest all turn off the VMs during recreation
of second node. someone said if you have sharding with 3.7.x, maybe some vip vm
can be up...if it just a filesystem, then just turn off the backup service until
you recreate the second node. depending on your resources and how big is your
storage, it might be hours to recreate it and even days...here's my process
on recreating the second or third node (copied and modifed from the net),#make
sure partition is already added!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!This
procedure is for replacing a failed server, IF your newly installed server has
the same hostname as the failed one:(If your new server will have a different
hostname, see this article instead.)For purposes of this example, the server
that crashed will be server3 and the other servers will be server1 and server2On
both server1 and server2, make sure hostname server3 resolves to the correct IP
address of the new replacement server.#On either server1 or server2, dogrep
server3 /var/lib/glusterd/peers/*This will return a uuid followed by
":hostname1=server3"#On server3, make sure glusterd is stopped, then
doecho UUID={uuid from previous step}>/var/lib/glusterd/glusterd.info#actual
testing below,[root at node1 ~]# cat
/var/lib/glusterd/glusterd.infoUUID=4b9d153c-5958-4dbe-8f91-7b5002882aacoperating-version=30710#the
second line is new.........  maybe not needed...On server3:make sure that all
brick directories are created/mountedstart glusterdpeer probe one of the
existing servers#restart glusterd, check that full peer list has been populated
using gluster peer status(if peers are missing, probe them explicitly, then
restart glusterd again)#check that full volume configuration has been populated
using gluster volume infoif volume configuration is missing, do #on the other
nodegluster volume sync "replace-node" all#on the node to be
replacedsetfattr -n trusted.glusterfs.volume-id -v 0x$(grep volume-id
/var/lib/glusterd/vols/v1/info | cut -d= -f2 | sed 's/-//g')
/gfs/b1/v1setfattr -n trusted.glusterfs.volume-id -v 0x$(grep volume-id
/var/lib/glusterd/vols/v2/info | cut -d= -f2 | sed 's/-//g')
/gfs/b2/v2setfattr -n trusted.glusterfs.volume-id -v 0x$(grep volume-id
/var/lib/glusterd/vols/config/info | cut -d= -f2 | sed 's/-//g')
/gfs/b1/config/c1mount -t glusterfs localhost:config /data/data1#install ctdb if
not yet installed and put it back online, use the step on creating the ctdb
config but #use your common sense not to deleted or modify current one.gluster
vol heal v1 fullgluster vol heal v2 fullgluster vol heal config full     On
Tuesday, August 2, 2016 11:57 AM, Andres E. Moya <amoya at
moyasolutions.com> wrote:   Hi, we have a 2 node replica setupon 1 of the
nodes the file system that had the brick on it failed, not the OScan we re
create a file system and mount the bricks on the same mount pointwhat will
happen, will the data from the other node sync over, or will the failed node
wipe out the data on the other mode?what would be the correct process?Thanks in
advance for any help_______________________________________________Gluster-users
mailing listGluster-users at
gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-users

     

_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160803/55906502/attachment.html>

Atin Mukherjee

2016-Aug-03 19:58 UTC

head link

[Gluster-users] Failed file system

Use replace brick commit force.

@Pranith/@Anuradha - post this will self heal be triggered automatically or
a manual trigger is needed?

On Thursday 4 August 2016, Andres E. Moya <amoya at moyasolutions.com>
wrote:
> Does anyone else have input?
>
> we are currently only running off 1 node and one node is offline in
> replicate brick.
>
> we are not experiencing any downtime because the 1 node is up.
>
> I do not understand which is the best way to bring up a second node.
>
> Do we just re create a file system on the node that is down and the mount
> points and allow gluster to heal( my concern with this is whether the node
> that is down will some how take precedence and wipe out the data on the
> healthy node instead of vice versa)
>
> Or do we fully wipe out the config on the node that is down, re create the
> file system and re add the node that is down into gluster using the add
> brick command replica 3, and then wait for it to heal then run the remove
> brick command for the failed brick
>
> which would be the safest and easiest to accomplish
>
> thanks for any input
>
>
>
> ------------------------------
> *From: *"Leno Vo" <lenovolastname at yahoo.com
> <javascript:_e(%7B%7D,'cvml','lenovolastname at
yahoo.com');>>
> *To: *"Andres E. Moya" <amoya at moyasolutions.com
> <javascript:_e(%7B%7D,'cvml','amoya at
moyasolutions.com');>>
> *Cc: *"gluster-users" <gluster-users at gluster.org
> <javascript:_e(%7B%7D,'cvml','gluster-users at
gluster.org');>>
> *Sent: *Tuesday, August 2, 2016 6:45:27 PM
> *Subject: *Re: [Gluster-users] Failed file system
>
> if you don't want any downtime (in the case that your node 2 really
die),
> you have to create a new gluster san (if you have the resources of course,
> 3 nodes as much as possible this time), and then just migrate your vms (or
> files), therefore no downtime but you have to cross your finger that the
> only node will not die too...  also without sharding the vm migration
> especially an rdp one, will be slow access from users till it migrated.
>
> you have to start testing sharding, it's fast and cool...
>
>
>
>
> On Tuesday, August 2, 2016 2:51 PM, Andres E. Moya <
> amoya at moyasolutions.com
> <javascript:_e(%7B%7D,'cvml','amoya at
moyasolutions.com');>> wrote:
>
>
> couldnt we just add a new server by
>
> gluster peer probe
> gluster volume add-brick replica 3 (will this command succeed with 1
> current failed brick?)
>
> let it heal, then
>
> gluster volume remove remove-brick
> ------------------------------
> *From: *"Leno Vo" <lenovolastname at yahoo.com
> <javascript:_e(%7B%7D,'cvml','lenovolastname at
yahoo.com');>>
> *To: *"Andres E. Moya" <amoya at moyasolutions.com
> <javascript:_e(%7B%7D,'cvml','amoya at
moyasolutions.com');>>,
> "gluster-users" <gluster-users at gluster.org
> <javascript:_e(%7B%7D,'cvml','gluster-users at
gluster.org');>>
> *Sent: *Tuesday, August 2, 2016 1:26:42 PM
> *Subject: *Re: [Gluster-users] Failed file system
>
> you need to have a downtime to recreate the second node, two nodes is
> actually not good for production and you should have put raid 1 or raid 5
> as your gluster storage, when you recreate the second node you might try
> running some VMs that need to be up and rest of vm need to be down but stop
> all backup and if you have replication, stop it too.  if you have 1G nic,
> 2cpu and less 8Gram, then i suggest all turn off the VMs during recreation
> of second node. someone said if you have sharding with 3.7.x, maybe some
> vip vm can be up...
>
> if it just a filesystem, then just turn off the backup service until you
> recreate the second node. depending on your resources and how big is your
> storage, it might be hours to recreate it and even days...
>
> here's my process on recreating the second or third node (copied and
> modifed from the net),
>
> #make sure partition is already added!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> This procedure is for replacing a failed server, IF your newly installed
> server has the same hostname as the failed one:
>
> (If your new server will have a different hostname, see this article
> instead.)
>
> For purposes of this example, the server that crashed will be server3 and
> the other servers will be server1 and server2
>
> On both server1 and server2, make sure hostname server3 resolves to the
> correct IP address of the new replacement server.
> #On either server1 or server2, do
> grep server3 /var/lib/glusterd/peers/*
>
> This will return a uuid followed by ":hostname1=server3"
>
> #On server3, make sure glusterd is stopped, then do
> echo UUID={uuid from previous step}>/var/lib/glusterd/glusterd.info
>
> #actual testing below,
> [root at node1 ~]# cat /var/lib/glusterd/glusterd.info
> UUID=4b9d153c-5958-4dbe-8f91-7b5002882aac
> operating-version=30710
> #the second line is new.........  maybe not needed...
>
> On server3:
> make sure that all brick directories are created/mounted
> start glusterd
> peer probe one of the existing servers
>
> #restart glusterd, check that full peer list has been populated using
>  gluster peer status
>
> (if peers are missing, probe them explicitly, then restart glusterd again)
> #check that full volume configuration has been populated using
>  gluster volume info
>
> if volume configuration is missing, do
> #on the other node
> gluster volume sync "replace-node" all
>
> #on the node to be replaced
> setfattr -n trusted.glusterfs.volume-id -v 0x$(grep volume-id
> /var/lib/glusterd/vols/v1/info | cut -d= -f2 | sed 's/-//g')
/gfs/b1/v1
> setfattr -n trusted.glusterfs.volume-id -v 0x$(grep volume-id
> /var/lib/glusterd/vols/v2/info | cut -d= -f2 | sed 's/-//g')
/gfs/b2/v2
> setfattr -n trusted.glusterfs.volume-id -v 0x$(grep volume-id
> /var/lib/glusterd/vols/config/info | cut -d= -f2 | sed 's/-//g')
> /gfs/b1/config/c1
>
> mount -t glusterfs localhost:config /data/data1
>
> #install ctdb if not yet installed and put it back online, use the step on
> creating the ctdb config but
> #use your common sense not to deleted or modify current one.
>
> gluster vol heal v1 full
> gluster vol heal v2 full
> gluster vol heal config full
>
>
>
> On Tuesday, August 2, 2016 11:57 AM, Andres E. Moya <
> amoya at moyasolutions.com
> <javascript:_e(%7B%7D,'cvml','amoya at
moyasolutions.com');>> wrote:
>
>
> Hi, we have a 2 node replica setup
> on 1 of the nodes the file system that had the brick on it failed, not the
> OS
> can we re create a file system and mount the bricks on the same mount point
>
> what will happen, will the data from the other node sync over, or will the
> failed node wipe out the data on the other mode?
>
> what would be the correct process?
>
> Thanks in advance for any help
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> <javascript:_e(%7B%7D,'cvml','Gluster-users at
gluster.org');>
> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
>
>
>
-- 
--Atin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160804/cd94d71a/attachment.html>

Gluster users - Aug 2016 - Failed file system

[Gluster-users] Failed file system

[Gluster-users] Failed file system

[Gluster-users] Failed file system