thr3ads.net - Gluster users - [Gluster-users] Advise on recovering from a bad replica please [Jun 2014]

If this information is useful, please help other people find it:
Share via:

John Gardeniers

2014-Jun-24 22:59 UTC

[Gluster-users] Advise on recovering from a bad replica please

Hi All,

We're using Gluster as the storage for our virtualization. This consists
of 2 servers with a single brick each configured as a replica pair. We
also have a geo-replica on one of those two servers.

For reasons that don't really matter, last weekend we had a situation
which cause one server to reboot a number of times, which in turn
resulted in a lot of heal-failed and split-brain errors. Because at the
same time VMs were being migrated across hosts we ended up with many
crashed VMs.

Due to the need get the VMs up and running with as quickly as possible
we decided to shut down one Gluster replica and use the "primary" one
alone. As the geo-replica is also on the node we shut down that leaves
us with just a single copy, which makes us rather nervous.

As we have decided to treat the files on the currently running node as
"correct", I'd appreciate advise on the best way to get the other
node
back into the replication. Should we simply bring it back on line and
try to correct the errors that I expect will be many or should we treat
it as a failed server and bring it back with an empty brick, rather than
what is currently in the existing brick? The volume/bricks are 5TB, of
which we're currently using around 2TB and the servers are on a 10Gb
network, so I imagine it shouldn't take too long to rebuild and this
would all be done out of hours anyway.

regards,
John

Pranith Kumar Karampuri

2014-Jun-25 09:05 UTC

head link

[Gluster-users] Advise on recovering from a bad replica please

On 06/25/2014 04:29 AM, John Gardeniers wrote:> Hi All,
>
> We're using Gluster as the storage for our virtualization. This
consists
> of 2 servers with a single brick each configured as a replica pair. We
> also have a geo-replica on one of those two servers.
>
> For reasons that don't really matter, last weekend we had a situation
> which cause one server to reboot a number of times, which in turn
> resulted in a lot of heal-failed and split-brain errors. Because at the
> same time VMs were being migrated across hosts we ended up with many
> crashed VMs.
>
> Due to the need get the VMs up and running with as quickly as possible
> we decided to shut down one Gluster replica and use the "primary"
one
> alone. As the geo-replica is also on the node we shut down that leaves
> us with just a single copy, which makes us rather nervous.
>
> As we have decided to treat the files on the currently running node as
> "correct", I'd appreciate advise on the best way to get the
other node
> back into the replication. Should we simply bring it back on line and
> try to correct the errors that I expect will be many or should we treat
> it as a failed server and bring it back with an empty brick, rather than
> what is currently in the existing brick? The volume/bricks are 5TB, of
> which we're currently using around 2TB and the servers are on a 10Gb
> network, so I imagine it shouldn't take too long to rebuild and this
> would all be done out of hours anyway.Considering you are saying there were split-brain related errors as 
well. I suggest you bring up empty brick.
Could you give "gluster volume info" output and tell me which brick
went
down. Based on that I will tell you
what you need to do.

Pranith>
> regards,
> John
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

Gluster users - Jun 2014 - Advise on recovering from a bad replica please

[Gluster-users] Advise on recovering from a bad replica please

[Gluster-users] Advise on recovering from a bad replica please