thr3ads.net - Gluster users - [Gluster-users] Advice for setup: SW RAID 6 vs JBOD [Jun 2019]

If this information is useful, please help other people find it:
Share via:

Eduardo Mayoral

2019-Jun-06 16:48 UTC

[Gluster-users] Advice for setup: SW RAID 6 vs JBOD

Your comment actually helps me more than you think, one of the main
doubts I have is whether I go for JOBD with replica 3 or SW RAID 6 with
replica2 + arbitrer. Before reading your email I was leaning more
towards JOBD, as reconstruction of a moderately big RAID 6 with mdadm
can be painful too. Now I see a reconstruct is going to be painful
either way...

For the record, the workload I am going to migrate is currently
18,314,445 MB and 34,752,784 inodes (which is not exactly the same as
files, but let's use that for a rough estimate), for an average file
size of about 539 KB per file.

Thanks a lot for your time and insights!

On 6/6/19 8:53, Hu Bert wrote:> Good morning,
>
> my comment won't help you directly, but i thought i'd send it
anyway...
>
> Our first glusterfs setup had 3 servers withs 4 disks=bricks (10TB,
> JBOD) each. Was running fine in the beginning, but then 1 disk failed.
> The following heal took ~1 month, with a bad performance (quite high
> IO). Shortly after the heal hat finished another disk failed -> same
> problems again. Not funny.
>
> For our new system we decided to use 3 servers with 10 disks (10 TB)
> each, but now the 10 disks in a SW RAID 10 (well, we split the 10
> disks into 2 SW RAID 10, each of them is a brick, we have 2 gluster
> volumes). A lot of disk space "wasted", with this type of SW RAID
and
> a replicate 3 setup, but we wanted to avoid the "healing takes a long
> time with bad performance" problems. Now mdadm takes care of
> replicating data, glusterfs should always see "good" bricks.
>
> And the decision may depend on what kind of data you have. Many small
> files, like tens of millions? Or not that much, but bigger files? I
> once watched a video (i think it was this one:
> https://www.youtube.com/watch?v=61HDVwttNYI). Recommendation there:
> RAID 6 or 10 for small files, for big files... well, already 2 years
> "old" ;-)
>
> As i said, this won't help you directly. You have to identify
what's
> most important for your scenario; as you said, high performance is not
> an issue - if this is true even when you have slight performance
> issues after a disk fail then ok. My experience so far: the bigger and
> slower the disks are and the more data you have -> healing will hurt
> -> try to avoid this. If the disks are small and fast (SSDs), healing
> will be faster -> JBOD is an option.
>
>
> hth,
> Hubert
>
> Am Mi., 5. Juni 2019 um 11:33 Uhr schrieb Eduardo Mayoral <emayoral at
arsys.es>:
>> Hi,
>>
>>     I am looking into a new gluster deployment to replace an ancient
one.
>>
>>     For this deployment I will be using some repurposed servers I
>> already have in stock. The disk specs are 12 * 3 TB SATA disks. No HW
>> RAID controller. They also have some SSD which would be nice to
leverage
>> as cache or similar to improve performance, since it is already there.
>> Advice on how to leverage the SSDs would be greatly appreciated.
>>
>>     One of the design choices I have to make is using 3 nodes for a
>> replica-3 with JBOD, or using 2 nodes with a replica-2 and using SW
RAID
>> 6 for the disks, maybe adding a 3rd node with a smaller amount of disk
>> as metadata node for the replica set. I would love to hear advice on
the
>> pros and cons of each setup from the gluster experts.
>>
>>     The data will be accessed from 4 to 6 systems with native gluster,
>> not sure if that makes any difference.
>>
>>     The amount of data I have to store there is currently 20 TB, with
>> moderate growth. iops are quite low so high performance is not an
issue.
>> The data will fit in any of the two setups.
>>
>>     Thanks in advance for your advice!
>>
>> --
>> Eduardo Mayoral Jimeno
>> Systems engineer, platform department. Arsys Internet.
>> emayoral at arsys.es - +34 941 620 105 - ext 2153
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
-- 
Eduardo Mayoral Jimeno
Systems engineer, platform department. Arsys Internet.
emayoral at arsys.es - +34 941 620 105 - ext 2153

Michael Metz-Martini

2019-Jun-06 18:46 UTC

head link

[Gluster-users] Advice for setup: SW RAID 6 vs JBOD

Hi

Am 06.06.19 um 18:48 schrieb Eduardo Mayoral:> Your comment actually helps me more than you think, one of the main
> doubts I have is whether I go for JOBD with replica 3 or SW RAID 6 with
> replica2 + arbitrer. Before reading your email I was leaning more
> towards JOBD, as reconstruction of a moderately big RAID 6 with mdadm
> can be painful too. Now I see a reconstruct is going to be painful
> either way...
> 
> For the record, the workload I am going to migrate is currently
> 18,314,445 MB and 34,752,784 inodes (which is not exactly the same as
> files, but let's use that for a rough estimate), for an average file
> size of about 539 KB per file.
> 
> Thanks a lot for your time and insights!Currently we're hosting ~200 TB split into about 3.500.000.000 files on
a Distributed-Replicate-2-gluster volume with each brick running on a
hw-raid6 of 8 x 8 TB disks. As we never had a failed drive 'till now I
can't tell you something about recovery times but rebalance is damn slow
with such high number of small files (so should recovery on
jbod-bricks). I think raid-recovery from local disks will be much faster.

As our files are nearly 100% readonly and split-brain-issues could be
resolevd more or less "easily" we decided against replica 3 in favor
of
hardware raid6 redundancy.

-- 
Kind regards
  Michael Metz-Martini

Hu Bert

2019-Jun-07 05:38 UTC

head link

[Gluster-users] Advice for setup: SW RAID 6 vs JBOD

If i remember correctly: in the video they suggested not to make a
RAID 10 too big (i.e. too many (big) disks), because the RAID resync
then could take a long time. They didn't mention a limit; on my 3
servers with 2 RAID 10 (1x4 disks, 1x6 disks), no disk failed so far,
but there were automatic periodic redundancy checks (mdadm checkarray)
which ran for a couple of days, increasing load on the servers and
responsiveness of glusterfs on the clients. Almost no one even noticed
that mdadm checks were running :-)

But if i compare it with our old JBOD setup: after the disk change the
heal took about a month, resulting in really poor performance on the
client side. As we didn't want to experience that period again ->
throw hardware at the problem. Maybe a different setup (10 disks -> 5
RAID 1, building a distribute replicate) would've been even better,
but so far we're happy with the current setup.

Am Do., 6. Juni 2019 um 18:48 Uhr schrieb Eduardo Mayoral <emayoral at
arsys.es>:>
> Your comment actually helps me more than you think, one of the main
> doubts I have is whether I go for JOBD with replica 3 or SW RAID 6 with
> replica2 + arbitrer. Before reading your email I was leaning more
> towards JOBD, as reconstruction of a moderately big RAID 6 with mdadm
> can be painful too. Now I see a reconstruct is going to be painful
> either way...
>
> For the record, the workload I am going to migrate is currently
> 18,314,445 MB and 34,752,784 inodes (which is not exactly the same as
> files, but let's use that for a rough estimate), for an average file
> size of about 539 KB per file.
>
> Thanks a lot for your time and insights!
>
> On 6/6/19 8:53, Hu Bert wrote:
> > Good morning,
> >
> > my comment won't help you directly, but i thought i'd send it
anyway...
> >
> > Our first glusterfs setup had 3 servers withs 4 disks=bricks (10TB,
> > JBOD) each. Was running fine in the beginning, but then 1 disk failed.
> > The following heal took ~1 month, with a bad performance (quite high
> > IO). Shortly after the heal hat finished another disk failed ->
same
> > problems again. Not funny.
> >
> > For our new system we decided to use 3 servers with 10 disks (10 TB)
> > each, but now the 10 disks in a SW RAID 10 (well, we split the 10
> > disks into 2 SW RAID 10, each of them is a brick, we have 2 gluster
> > volumes). A lot of disk space "wasted", with this type of SW
RAID and
> > a replicate 3 setup, but we wanted to avoid the "healing takes a
long
> > time with bad performance" problems. Now mdadm takes care of
> > replicating data, glusterfs should always see "good" bricks.
> >
> > And the decision may depend on what kind of data you have. Many small
> > files, like tens of millions? Or not that much, but bigger files? I
> > once watched a video (i think it was this one:
> > https://www.youtube.com/watch?v=61HDVwttNYI). Recommendation there:
> > RAID 6 or 10 for small files, for big files... well, already 2 years
> > "old" ;-)
> >
> > As i said, this won't help you directly. You have to identify
what's
> > most important for your scenario; as you said, high performance is not
> > an issue - if this is true even when you have slight performance
> > issues after a disk fail then ok. My experience so far: the bigger and
> > slower the disks are and the more data you have -> healing will
hurt
> > -> try to avoid this. If the disks are small and fast (SSDs),
healing
> > will be faster -> JBOD is an option.
> >
> >
> > hth,
> > Hubert
> >
> > Am Mi., 5. Juni 2019 um 11:33 Uhr schrieb Eduardo Mayoral <emayoral
at arsys.es>:
> >> Hi,
> >>
> >>     I am looking into a new gluster deployment to replace an
ancient one.
> >>
> >>     For this deployment I will be using some repurposed servers I
> >> already have in stock. The disk specs are 12 * 3 TB SATA disks. No
HW
> >> RAID controller. They also have some SSD which would be nice to
leverage
> >> as cache or similar to improve performance, since it is already
there.
> >> Advice on how to leverage the SSDs would be greatly appreciated.
> >>
> >>     One of the design choices I have to make is using 3 nodes for
a
> >> replica-3 with JBOD, or using 2 nodes with a replica-2 and using
SW RAID
> >> 6 for the disks, maybe adding a 3rd node with a smaller amount of
disk
> >> as metadata node for the replica set. I would love to hear advice
on the
> >> pros and cons of each setup from the gluster experts.
> >>
> >>     The data will be accessed from 4 to 6 systems with native
gluster,
> >> not sure if that makes any difference.
> >>
> >>     The amount of data I have to store there is currently 20 TB,
with
> >> moderate growth. iops are quite low so high performance is not an
issue.
> >> The data will fit in any of the two setups.
> >>
> >>     Thanks in advance for your advice!
> >>
> >> --
> >> Eduardo Mayoral Jimeno
> >> Systems engineer, platform department. Arsys Internet.
> >> emayoral at arsys.es - +34 941 620 105 - ext 2153
> >>
> >>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> https://lists.gluster.org/mailman/listinfo/gluster-users
>
> --
> Eduardo Mayoral Jimeno
> Systems engineer, platform department. Arsys Internet.
> emayoral at arsys.es - +34 941 620 105 - ext 2153
>

Gluster users - Jun 2019 - Advice for setup: SW RAID 6 vs JBOD

[Gluster-users] Advice for setup: SW RAID 6 vs JBOD

[Gluster-users] Advice for setup: SW RAID 6 vs JBOD

[Gluster-users] Advice for setup: SW RAID 6 vs JBOD