thr3ads.net - Gluster users - [Gluster-users] Production cluster planning [Oct 2016]

If this information is useful, please help other people find it:
Share via:

Joe Julian

2016-Oct-26 21:38 UTC

[Gluster-users] Production cluster planning

On 10/26/2016 02:12 PM, Gandalf Corvotempesta wrote:> 2016-10-26 23:07 GMT+02:00 Joe Julian <joe at julianfamily.org>:
>> And yes, they can fail, but 20TB is small enough to heal pretty
quickly.
> 20TB small enough to build quickly? On which network? Gluster doesn't
> have a dedicated cluster network, if the cluster is being hevily
> accessed, the healing will slow down everything else (or everything
> else will slow down the healing)
Quickly = MTTR is within tolerances to continue to meet SLA. It's just math.

As for a dedicated heal network, split-horizon dns handles that just 
fine. Clients resolve a server's hostname to the "eth1" (for
example)
address and the servers themselves resolve the same hostname to the 
"eth0" address. We played with bonding but decided against the
complexity.
>
> Anyway, you can heal quickly, but I still prefere to have data safe on
> each node. If you start with 3 server at once, probably each disk is
> coming from the same batch, thus a massive disks failure is easy to
> get.
There's preference and there's engineering to meet requirements. If your
SLA is 5 nines and you engineer 6 nines, you may realize that the 
difference between a 99.99993% uptime and a 99.99997% uptime isn't worth 
the added expense of doing replication /and/ raid-1.
> If you loose only 2 disks, one for each server, from the same replica
> group, you are game over. With RAID6, you have to loose 5 disks from
> the same replica group.
I never loose my drives. They're always firmly attached. :P

With 300 drives, 60 bricks, replica 3 (across 3 racks), I have a six 
nines availability for any one replica subvolume. If you really want to 
fudge the numbers, the reliability for any given file is not worth 
calculating in that volume. The odds of all three bricks failing for any 
1 file among 20 distribute subvolumes is statistically infinitesimal.
>
> In my environment, I can create 4 RAID-0 on each server (3 disks on
> each RAID0), or 2 RAID-6 with 6 disks each, or 1 RAID-6 with 12 disks
> or 1 RAID-7 with 12 disks (RAID-7 with less than 12 disks is
> non-sense)
> I don't know which one is better.
Just do the reliability calculations and engineer a storage system to 
meet (exceed) your obligations within the available budget. 
http://www.eventhelix.com/realtimemantra/faulthandling/system_reliability_availability.htm

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161026/e9f91fc2/attachment.html>

Lindsay Mathieson

2016-Oct-26 21:54 UTC

head link

[Gluster-users] Production cluster planning

Maybe a controversial question (and hopefully not trolling), but any 
particularly reason you choose gluster over ceph for these larger setups 
Joe?

For myself, gluster is much easier to manage and provides better 
performance on my small non-enterprise setup, plus it plays nice with zfs.

But I thought ceph had the edge on large, many node, many disk setups. 
It would seem it handles adding/removing disks better that the juggling 
you have to do with gluster to keep replication triads even.

To complex/fragile maybe?

Genuinely curious.

-- 
Lindsay Mathieson

Gandalf Corvotempesta

2016-Oct-27 07:03 UTC

head link

[Gluster-users] Production cluster planning

2016-10-26 23:38 GMT+02:00 Joe Julian <joe at
julianfamily.org>:> Quickly = MTTR is within tolerances to continue to meet SLA. It's just
math.
Obviously yes. But in the real world, you can have the best SLAs in
the world, but if you loose data, you loose
customers.
> As for a dedicated heal network, split-horizon dns handles that just fine.
> Clients resolve a server's hostname to the "eth1" (for
example) address and
> the servers themselves resolve the same hostname to the "eth0"
address. We
> played with bonding but decided against the complexity.
Good Idea. Thanks. In this was, the cluster network is serparated from
the client network, like with ceph.
Just a question: you need two dns infrastructure for this, right ? ns1
and ns2 used by client pointing to eth0
and ns3 and ns4 used by gluster pointing to eth1.

In small environment the hosts file could be used, but I prefere the DNS way.
> There's preference and there's engineering to meet requirements. If
your SLA
> is 5 nines and you engineer 6 nines, you may realize that the difference
> between a 99.99993% uptime and a 99.99997% uptime isn't worth the added
> expense of doing replication and raid-1.
How to you calculate the number of nines in this environment ?
In example, to have 6 nines (for availability and data consistency),
which configuration should I adopt ?
I can have 6 nines for the whole cluster but 2 nines for data.
In the first case, the whole cluster can't go totally down (tons of
node, as example), in the second, some data could
be lost (replica 1 or 2)
> With 300 drives, 60 bricks, replica 3 (across 3 racks), I have a six nines
> availability for any one replica subvolume. If you really want to fudge the
> numbers, the reliability for any given file is not worth calculating in
that
> volume. The odds of all three bricks failing for any 1 file among 20
> distribute subvolumes is statistically infinitesimal.
How many servers ?
300 drives, bought in a very short time are willing to fail quicky
with multiple failure per time.
I had 2 drive failures in less than 1 hour some month ago. Hopefully I
was using a RAID-6
Both drives was from the same manufacturer and with sequential serial number.

Gandalf Corvotempesta

2016-Oct-27 07:06 UTC

head link

[Gluster-users] Production cluster planning

2016-10-26 23:38 GMT+02:00 Joe Julian <joe at
julianfamily.org>:> Just do the reliability calculations and engineer a storage system to meet
> (exceed) your obligations within the available budget.
>
http://www.eventhelix.com/realtimemantra/faulthandling/system_reliability_availability.htm
>
This is good to evaluate reliability with 3 nodes in parallel (like replica 3).
With three nodes, with 2-nines each, I'll get 6-nines.
But how can I calculate the number of nines for each single server?

Gluster users - Oct 2016 - Production cluster planning

[Gluster-users] Production cluster planning

[Gluster-users] Production cluster planning

[Gluster-users] Production cluster planning

[Gluster-users] Production cluster planning