Hi,> > We ara also using 10TB disks, heal takes 7-8 days. > > You can play with "cluster.shd-max-threads" setting. It is default 1 I > > think. I am using it with 4. > > Below you can find more info: > > https://access.redhat.com/solutions/882233 > cluster.shd-max-threads: 8 > cluster.shd-wait-qlength: 10000Our setup: cluster.shd-max-threads: 2 cluster.shd-wait-qlength: 10000> >> Volume Name: shared > >> Type: Distributed-Replicate > A, you have distributed-replicated volume, but I choose only replicated > (for beginning simplicity :) > May be replicated volume are healing faster?Well, maybe our setup with 3 servers and 4 disks=bricks == 12 bricks, resulting in a distributed-replicate volume (all /dev/sd{a,b,c,d} identical) , isn't optimal? And it would be better to create a replicate 3 volume with only 1 (big) brick per server (with 4 disks: either a logical volume or sw/hw raid)? But it would be interesting to know if a replicate volume is healing faster than a distributed-replicate volume - even if there was only 1 faulty brick. Thx Hubert
Amar Tumballi Suryanarayan
2019-Jan-22 06:10 UTC
[Gluster-users] usage of harddisks: each hdd a brick? raid?
On Thu, Jan 10, 2019 at 1:56 PM Hu Bert <revirii at googlemail.com> wrote:> Hi, > > > > We ara also using 10TB disks, heal takes 7-8 days. > > > You can play with "cluster.shd-max-threads" setting. It is default 1 I > > > think. I am using it with 4. > > > Below you can find more info: > > > https://access.redhat.com/solutions/882233 > > cluster.shd-max-threads: 8 > > cluster.shd-wait-qlength: 10000 > > Our setup: > cluster.shd-max-threads: 2 > cluster.shd-wait-qlength: 10000 > > > >> Volume Name: shared > > >> Type: Distributed-Replicate > > A, you have distributed-replicated volume, but I choose only replicated > > (for beginning simplicity :) > > May be replicated volume are healing faster? > > Well, maybe our setup with 3 servers and 4 disks=bricks == 12 bricks, > resulting in a distributed-replicate volume (all /dev/sd{a,b,c,d} > identical) , isn't optimal? And it would be better to create a > replicate 3 volume with only 1 (big) brick per server (with 4 disks: > either a logical volume or sw/hw raid)? > > But it would be interesting to know if a replicate volume is healing > faster than a distributed-replicate volume - even if there was only 1 > faulty brick. > >We don't have any data point to agree to this, but it may be true. Specially, as the crawling when DHT (ie, distribute) is involved can get little slower, which means, the healing would get slower too. We are trying to experiment few performance enhancement patches (like https://review.gluster.org/20636), would be great to see how things work with newer base. Will keep the list updated about performance numbers once we have some more data on them. -Amar> > Thx > Hubert > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > >-- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190122/c5447120/attachment.html>
Hey there, just a little update... This week we switched from our 3 "old" gluster servers to 3 new ones, and with that we threw some hardware at the problem... old: 3 servers, each has 4 * 10 TB disks; each disk is used as a brick -> 4 x 3 = 12 distribute-replicate new: 3 servers, each has 10 * 10 TB disks; we built 2 RAID10 (6 disks and 4 disks), each RAID10 is a brick -> we split our data into 2 volumes, 1 x 3 = 3 replicate; as filesystem we use XFS (instead of ext4) with mount options inode64,noatime,nodiratime now. What we've seen so far: the volumes are independent - if one volume is under load, the other one isn't affected by that. Throughput, latency etc. seems to be better now. Of course you waste a lot of disk space when using RAID10 and replicate setup: 100TB per server (so 300TB in total) result in ~50TB volume size, but during the last year we had problems due to hard disk errors and the resulting brick restore (reset-brick) which took very long. Was a hard time... :-/ So our conclusion was: as the heal can be really painful, take very long and influence performance very badly -> try to avoid heals by not having to do "big" heals at all. That's why we chose a RAID10: under normal circumstances (a disk failing from time to time) there may be a RAID resync, but that may be faster and cause fewer performance issues than having to restore a complete brick. Or, more general: if you have big, slow disk and quite high I/O -> think about not using single disks as bricks. If you have the hardware (and the money), think about using RAID1 or RAID10. The smaller and/or faster the disks are (e.g. you have a lot of 1TB SSD/NVMe), using them as bricks might work better as (in case of disk failure) the heal should be much faster. No information about RAID5/6 possible, wasn't taken into consideration... just my 2 ?cents from (still) a gluster amateur :-) Best regards, Hubert Am Di., 22. Jan. 2019 um 07:11 Uhr schrieb Amar Tumballi Suryanarayan <atumball at redhat.com>:> > > > On Thu, Jan 10, 2019 at 1:56 PM Hu Bert <revirii at googlemail.com> wrote: >> >> Hi, >> >> > > We ara also using 10TB disks, heal takes 7-8 days. >> > > You can play with "cluster.shd-max-threads" setting. It is default 1 I >> > > think. I am using it with 4. >> > > Below you can find more info: >> > > https://access.redhat.com/solutions/882233 >> > cluster.shd-max-threads: 8 >> > cluster.shd-wait-qlength: 10000 >> >> Our setup: >> cluster.shd-max-threads: 2 >> cluster.shd-wait-qlength: 10000 >> >> > >> Volume Name: shared >> > >> Type: Distributed-Replicate >> > A, you have distributed-replicated volume, but I choose only replicated >> > (for beginning simplicity :) >> > May be replicated volume are healing faster? >> >> Well, maybe our setup with 3 servers and 4 disks=bricks == 12 bricks, >> resulting in a distributed-replicate volume (all /dev/sd{a,b,c,d} >> identical) , isn't optimal? And it would be better to create a >> replicate 3 volume with only 1 (big) brick per server (with 4 disks: >> either a logical volume or sw/hw raid)? >> >> But it would be interesting to know if a replicate volume is healing >> faster than a distributed-replicate volume - even if there was only 1 >> faulty brick. >> > > We don't have any data point to agree to this, but it may be true. Specially, as the crawling when DHT (ie, distribute) is involved can get little slower, which means, the healing would get slower too. > > We are trying to experiment few performance enhancement patches (like https://review.gluster.org/20636), would be great to see how things work with newer base. Will keep the list updated about performance numbers once we have some more data on them. > > -Amar > >> >> >> Thx >> Hubert >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > > -- > Amar Tumballi (amarts)