Good morning, my comment won't help you directly, but i thought i'd send it anyway... Our first glusterfs setup had 3 servers withs 4 disks=bricks (10TB, JBOD) each. Was running fine in the beginning, but then 1 disk failed. The following heal took ~1 month, with a bad performance (quite high IO). Shortly after the heal hat finished another disk failed -> same problems again. Not funny. For our new system we decided to use 3 servers with 10 disks (10 TB) each, but now the 10 disks in a SW RAID 10 (well, we split the 10 disks into 2 SW RAID 10, each of them is a brick, we have 2 gluster volumes). A lot of disk space "wasted", with this type of SW RAID and a replicate 3 setup, but we wanted to avoid the "healing takes a long time with bad performance" problems. Now mdadm takes care of replicating data, glusterfs should always see "good" bricks. And the decision may depend on what kind of data you have. Many small files, like tens of millions? Or not that much, but bigger files? I once watched a video (i think it was this one: https://www.youtube.com/watch?v=61HDVwttNYI). Recommendation there: RAID 6 or 10 for small files, for big files... well, already 2 years "old" ;-) As i said, this won't help you directly. You have to identify what's most important for your scenario; as you said, high performance is not an issue - if this is true even when you have slight performance issues after a disk fail then ok. My experience so far: the bigger and slower the disks are and the more data you have -> healing will hurt -> try to avoid this. If the disks are small and fast (SSDs), healing will be faster -> JBOD is an option. hth, Hubert Am Mi., 5. Juni 2019 um 11:33 Uhr schrieb Eduardo Mayoral <emayoral at arsys.es>:> > Hi, > > I am looking into a new gluster deployment to replace an ancient one. > > For this deployment I will be using some repurposed servers I > already have in stock. The disk specs are 12 * 3 TB SATA disks. No HW > RAID controller. They also have some SSD which would be nice to leverage > as cache or similar to improve performance, since it is already there. > Advice on how to leverage the SSDs would be greatly appreciated. > > One of the design choices I have to make is using 3 nodes for a > replica-3 with JBOD, or using 2 nodes with a replica-2 and using SW RAID > 6 for the disks, maybe adding a 3rd node with a smaller amount of disk > as metadata node for the replica set. I would love to hear advice on the > pros and cons of each setup from the gluster experts. > > The data will be accessed from 4 to 6 systems with native gluster, > not sure if that makes any difference. > > The amount of data I have to store there is currently 20 TB, with > moderate growth. iops are quite low so high performance is not an issue. > The data will fit in any of the two setups. > > Thanks in advance for your advice! > > -- > Eduardo Mayoral Jimeno > Systems engineer, platform department. Arsys Internet. > emayoral at arsys.es - +34 941 620 105 - ext 2153 > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users
Your comment actually helps me more than you think, one of the main doubts I have is whether I go for JOBD with replica 3 or SW RAID 6 with replica2 + arbitrer. Before reading your email I was leaning more towards JOBD, as reconstruction of a moderately big RAID 6 with mdadm can be painful too. Now I see a reconstruct is going to be painful either way... For the record, the workload I am going to migrate is currently 18,314,445 MB and 34,752,784 inodes (which is not exactly the same as files, but let's use that for a rough estimate), for an average file size of about 539 KB per file. Thanks a lot for your time and insights! On 6/6/19 8:53, Hu Bert wrote:> Good morning, > > my comment won't help you directly, but i thought i'd send it anyway... > > Our first glusterfs setup had 3 servers withs 4 disks=bricks (10TB, > JBOD) each. Was running fine in the beginning, but then 1 disk failed. > The following heal took ~1 month, with a bad performance (quite high > IO). Shortly after the heal hat finished another disk failed -> same > problems again. Not funny. > > For our new system we decided to use 3 servers with 10 disks (10 TB) > each, but now the 10 disks in a SW RAID 10 (well, we split the 10 > disks into 2 SW RAID 10, each of them is a brick, we have 2 gluster > volumes). A lot of disk space "wasted", with this type of SW RAID and > a replicate 3 setup, but we wanted to avoid the "healing takes a long > time with bad performance" problems. Now mdadm takes care of > replicating data, glusterfs should always see "good" bricks. > > And the decision may depend on what kind of data you have. Many small > files, like tens of millions? Or not that much, but bigger files? I > once watched a video (i think it was this one: > https://www.youtube.com/watch?v=61HDVwttNYI). Recommendation there: > RAID 6 or 10 for small files, for big files... well, already 2 years > "old" ;-) > > As i said, this won't help you directly. You have to identify what's > most important for your scenario; as you said, high performance is not > an issue - if this is true even when you have slight performance > issues after a disk fail then ok. My experience so far: the bigger and > slower the disks are and the more data you have -> healing will hurt > -> try to avoid this. If the disks are small and fast (SSDs), healing > will be faster -> JBOD is an option. > > > hth, > Hubert > > Am Mi., 5. Juni 2019 um 11:33 Uhr schrieb Eduardo Mayoral <emayoral at arsys.es>: >> Hi, >> >> I am looking into a new gluster deployment to replace an ancient one. >> >> For this deployment I will be using some repurposed servers I >> already have in stock. The disk specs are 12 * 3 TB SATA disks. No HW >> RAID controller. They also have some SSD which would be nice to leverage >> as cache or similar to improve performance, since it is already there. >> Advice on how to leverage the SSDs would be greatly appreciated. >> >> One of the design choices I have to make is using 3 nodes for a >> replica-3 with JBOD, or using 2 nodes with a replica-2 and using SW RAID >> 6 for the disks, maybe adding a 3rd node with a smaller amount of disk >> as metadata node for the replica set. I would love to hear advice on the >> pros and cons of each setup from the gluster experts. >> >> The data will be accessed from 4 to 6 systems with native gluster, >> not sure if that makes any difference. >> >> The amount of data I have to store there is currently 20 TB, with >> moderate growth. iops are quite low so high performance is not an issue. >> The data will fit in any of the two setups. >> >> Thanks in advance for your advice! >> >> -- >> Eduardo Mayoral Jimeno >> Systems engineer, platform department. Arsys Internet. >> emayoral at arsys.es - +34 941 620 105 - ext 2153 >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users-- Eduardo Mayoral Jimeno Systems engineer, platform department. Arsys Internet. emayoral at arsys.es - +34 941 620 105 - ext 2153
What if you have two fast 2TB SSDs per server in hardware RAID 1, 3 hosts in replica 3. Dual 10gb enterprise nics. This would end up being a single 2TB volume, correct? Seems like that would offer great speed and have pretty decent survivability. On Wed, Jun 5, 2019 at 11:54 PM Hu Bert <revirii at googlemail.com> wrote:> Good morning, > > my comment won't help you directly, but i thought i'd send it anyway... > > Our first glusterfs setup had 3 servers withs 4 disks=bricks (10TB, > JBOD) each. Was running fine in the beginning, but then 1 disk failed. > The following heal took ~1 month, with a bad performance (quite high > IO). Shortly after the heal hat finished another disk failed -> same > problems again. Not funny. > > For our new system we decided to use 3 servers with 10 disks (10 TB) > each, but now the 10 disks in a SW RAID 10 (well, we split the 10 > disks into 2 SW RAID 10, each of them is a brick, we have 2 gluster > volumes). A lot of disk space "wasted", with this type of SW RAID and > a replicate 3 setup, but we wanted to avoid the "healing takes a long > time with bad performance" problems. Now mdadm takes care of > replicating data, glusterfs should always see "good" bricks. > > And the decision may depend on what kind of data you have. Many small > files, like tens of millions? Or not that much, but bigger files? I > once watched a video (i think it was this one: > https://www.youtube.com/watch?v=61HDVwttNYI). Recommendation there: > RAID 6 or 10 for small files, for big files... well, already 2 years > "old" ;-) > > As i said, this won't help you directly. You have to identify what's > most important for your scenario; as you said, high performance is not > an issue - if this is true even when you have slight performance > issues after a disk fail then ok. My experience so far: the bigger and > slower the disks are and the more data you have -> healing will hurt > -> try to avoid this. If the disks are small and fast (SSDs), healing > will be faster -> JBOD is an option. > > > hth, > Hubert > > Am Mi., 5. Juni 2019 um 11:33 Uhr schrieb Eduardo Mayoral < > emayoral at arsys.es>: > > > > Hi, > > > > I am looking into a new gluster deployment to replace an ancient one. > > > > For this deployment I will be using some repurposed servers I > > already have in stock. The disk specs are 12 * 3 TB SATA disks. No HW > > RAID controller. They also have some SSD which would be nice to leverage > > as cache or similar to improve performance, since it is already there. > > Advice on how to leverage the SSDs would be greatly appreciated. > > > > One of the design choices I have to make is using 3 nodes for a > > replica-3 with JBOD, or using 2 nodes with a replica-2 and using SW RAID > > 6 for the disks, maybe adding a 3rd node with a smaller amount of disk > > as metadata node for the replica set. I would love to hear advice on the > > pros and cons of each setup from the gluster experts. > > > > The data will be accessed from 4 to 6 systems with native gluster, > > not sure if that makes any difference. > > > > The amount of data I have to store there is currently 20 TB, with > > moderate growth. iops are quite low so high performance is not an issue. > > The data will fit in any of the two setups. > > > > Thanks in advance for your advice! > > > > -- > > Eduardo Mayoral Jimeno > > Systems engineer, platform department. Arsys Internet. > > emayoral at arsys.es - +34 941 620 105 - ext 2153 > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190606/dd3c1468/attachment.html>