Il 2020-09-11 05:27 Martin B?hr ha scritto:> Excerpts from Gionatan Danti's message of 2020-09-11 00:35:52 +0200: >> The main point was the potentially long heal time > > could you (or anyone else) please elaborate on what long heal times are > to be expected?Hi, there are multiple factor at works here: - healing via network (gluster) vs internal bus data transfer (RAID rebuild); - gluster being a user-space application which commands a significant CPU load; - healing proceeding per-file and not in LBA order (ie: it has to traverse all the affected files/dirs, which means scattered random IO for the most part); - other things which I am surely missing.> we have a 3-node replica cluster running version 3.12.9 (we are > building > a new cluster now) with 32TiB of space. each node has a single brick on > top of a 7-disk raid5 (linux softraid)3.12.9, while being the official RHEL 7 release, is very old now.> at one point we had one node unavailable for one month (gluster failed > to start up properly on that node and we didn't have monitoring in > place > to notice) and the accumulated changes of one month of operation took 4 > months to heal. i would have expected this ideally to take 2 weeks or > less, one month at the worst (ie faster than or at least as fast as it > took to create the data but not slower, and especially not 4 times > slower)Wow, 4 months is a lot... but you had at least internal redundancy (RAID5 bricks). The OP was asking about running with *no* internal redundancy and this is the reason I suggest against it: losing a disk while needing weeks to heal is not good.> the initial heal count was about 6million files for one node and > 5.4million for the other. > ... > we do have a few huge directories with 250000, 88000, 60000 and 29000 > subdirectories each. in total 26TiB of small files, but no more than > a few 1000 per directory. (it's user data, some have more, some have > less) > > could those huge directories be responsible for the slow healing?The very high number of to-be-healed files surely has a negative impact on your heal speed. Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti at assyoma.it - info at assyoma.it GPG public key ID: FF5F32A8
Excerpts from Gionatan Danti's message of 2020-09-11 08:34:04 +0200:> > we have a 3-node replica cluster running version 3.12.9 > > with 32TiB of space. each node has a single brick on > > top of a 7-disk raid5 (linux softraid) > 3.12.9, while being the official RHEL 7 release, is very old now.yes, i am aware. we didn't bother upgrading as we need to expand capacity and it's cheaper to rent new servers than expand the old ones.> > the accumulated changes of one month of operation took 4 months to > > heal. > Wow, 4 months is a lot... but you had at least internal redundancy > (RAID5 bricks).right, that, and we had 3 replicas. we could have just dropped the third node, and would still have been ok. for the new cluster we decided that 2 nodes is enough, because the data is all backups anyways. even if we loose both nodes, we can at least in theory still recover all the data. whether that's a good decision is a risk calculation. is a third server worth the extra expense? we decided that, for what is essentially a backup, it's not. i considered 3 nodes but dropping the raid instead, but several comments inclusing yours convinced me that keeping the raid is good. on the new servers we'll each have 3 bricks with 5 disks in a raid 5 per brick.> > the initial heal count was about 6million files for one node and > > 5.4million for the other. > > ... > > we do have a few huge directories with 250000, 88000, 60000 and 29000 > > subdirectories each. in total 26TiB of small files, but no more than > > a few 1000 per directory. (it's user data, some have more, some have > > less) > > > > could those huge directories be responsible for the slow healing? > > The very high number of to-be-healed files surely has a negative impact > on your heal speed.that sounds like that there is an inefficiency within the healing process that causes the healing speed to be non-linear depending on the number of files. greetings, martin. -- general manager realss.com student mentor fossasia.org community mentor blug.sh beijinglug.club pike programmer pike.lysator.liu.se caudium.net societyserver.org Martin B?hr working in china http://societyserver.org/mbaehr/