thr3ads.net - Gluster users - [Gluster-users] How much disk can fail after a catastrophic failure occur? [Oct 2024]

If this information is useful, please help other people find it:
Share via:

Gilberto Ferreira

2024-Oct-19 15:25 UTC

[Gluster-users] How much disk can fail after a catastrophic failure occur?

Hi there.
I have 2 servers with this number of disks in each side:

pve01:~# df | grep disco
/dev/sdd          1.0T  9.4G 1015G   1% /disco1TB-0
/dev/sdh          1.0T  9.3G 1015G   1% /disco1TB-3
/dev/sde          1.0T  9.5G 1015G   1% /disco1TB-1
/dev/sdf          1.0T  9.4G 1015G   1% /disco1TB-2
/dev/sdg          2.0T   19G  2.0T   1% /disco2TB-1
/dev/sdc          2.0T   19G  2.0T   1% /disco2TB-0
/dev/sdj          1.0T  9.2G 1015G   1% /disco1TB-4

I have a Type: Distributed-Replicate gluster
So my question is: how much disk can be in fail state after losing data or
something?

Thanks in advance

---


Gilberto Nunes Ferreira
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20241019/a170abcf/attachment.html>

Andreas Schwibbe

2024-Oct-20 11:29 UTC

head link

[Gluster-users] How much disk can fail after a catastrophic failure occur?

Gilberto,

this totally depends on your setup.

With replica 2 you always have 2 copies of the same file.
So when you add bricks to your volume you'll want to add
Server1/disco1TB-0 ?and Server2/disco1TB-0 as a pair.
Meaning that each file goes to 1 server to 1 disk.
Thus your system can fail each 1 disk of any pair OR 1 server and still
be up.

However I recommend not to use replica 2 as you'll get into problems
with split-brain when 1 server is down.
When it is coming back up, you might have 2 versions of the same file
and you need a strategy to figure out which one of the two copies is
the actual one.
You can however set the volume to read-only if 1 server is down, then
you cannot get any splitbrains, but this comes maybe with downtime
depending on your usecase.

Hence why you can use at least replica 2 + 1 arbiter
Arbiter will hold metadata copies of each file (so the hardware
requirement is pretty low for this server and also doesn't need huge
disks) making it easy to find the valid filecopy and heal the invalid
one. (once had a NUC as arbiter, running totally fine) [when using
arbiter, be sure to create xfs with imaxpct=75?on arbiter?as the bricks
will hold metadata only not files]

If you've got enough resources for 3 servers, replica 3 is best.

When you do?
gluster v status
and you have replica 2
then the first two rows are a pair
if you have set replica 3
then the first three rows are paired and will hold copies of the same
file.

Cheers,
A.

Am Samstag, dem 19.10.2024 um 12:25 -0300 schrieb Gilberto
Ferreira:> Hi there.
> I have 2 servers with this number of disks in each side:
> 
> pve01:~# df | grep disco
> /dev/sdd ? ? ? ? ?1.0T ?9.4G 1015G ? 1% /disco1TB-0
> /dev/sdh ? ? ? ? ?1.0T ?9.3G 1015G ? 1% /disco1TB-3
> /dev/sde ? ? ? ? ?1.0T ?9.5G 1015G ? 1% /disco1TB-1
> /dev/sdf ? ? ? ? ?1.0T ?9.4G 1015G ? 1% /disco1TB-2
> /dev/sdg ? ? ? ? ?2.0T ? 19G ?2.0T ? 1% /disco2TB-1
> /dev/sdc ? ? ? ? ?2.0T ? 19G ?2.0T ? 1% /disco2TB-0
> /dev/sdj ? ? ? ? ?1.0T ?9.2G 1015G ? 1% /disco1TB-4
> 
> I have a?Type: Distributed-Replicate gluster
> So my question is: how much disk can be in fail state after losing
> data or something?
> 
> Thanks in advance
> 
> ---
> 
> 
> Gilberto Nunes Ferreira
> 
> ?
> 
> 
> 
> ________
> 
> 
> 
> Community Meeting Calendar:
> 
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20241020/4f4c5d7e/attachment.html>

Strahil Nikolov

2024-Oct-20 20:34 UTC

head link

[Gluster-users] How much disk can fail after a catastrophic failure occur?

If it's replica 2, you can loose up to 1 replica per distribution group.For
example, if you have a volume TEST with such setup:

server1:/brick1
server2:/brick1
server1:/brick2
server2:/brick2

You can loose any brick of the replica "/brick1" and any brick in the
replica "/brick2". So if you loose server1:/brick1 and server2:/brick2
-> no data loss will be experienced.
As usual, consider if you can add an arbiter for your volumes.

Best Regards,
Strahil Nikolov

    ? ??????, 19 ???????? 2024 ?. ? 18:32:40 ?. ???????+3, Gilberto Ferreira
<gilberto.nunes32 at gmail.com> ??????:
 
 Hi there.I have 2 servers with this number of disks in each side:
pve01:~# df | grep disco
/dev/sdd ? ? ? ? ?1.0T ?9.4G 1015G ? 1% /disco1TB-0
/dev/sdh ? ? ? ? ?1.0T ?9.3G 1015G ? 1% /disco1TB-3
/dev/sde ? ? ? ? ?1.0T ?9.5G 1015G ? 1% /disco1TB-1
/dev/sdf ? ? ? ? ?1.0T ?9.4G 1015G ? 1% /disco1TB-2
/dev/sdg ? ? ? ? ?2.0T ? 19G ?2.0T ? 1% /disco2TB-1
/dev/sdc ? ? ? ? ?2.0T ? 19G ?2.0T ? 1% /disco2TB-0
/dev/sdj ? ? ? ? ?1.0T ?9.2G 1015G ? 1% /disco1TB-4
I have a?Type: Distributed-Replicate glusterSo my question is: how much disk can
be in fail state after losing data or something?
Thanks in advance
---

Gilberto Nunes Ferreira
?








________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users at gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20241020/4a424036/attachment.html>

Possibly Parallel Threads

Search for more apparently analagous threads

Gluster users - Oct 2024 - How much disk can fail after a catastrophic failure occur?

[Gluster-users] How much disk can fail after a catastrophic failure occur?

[Gluster-users] How much disk can fail after a catastrophic failure occur?

[Gluster-users] How much disk can fail after a catastrophic failure occur?

Possibly Parallel Threads