Am So., 21. Juni 2020 um 19:43 Uhr schrieb Gionatan Danti <g.danti at assyoma.it>:> For the RAID6/10 setup, I found no issues: simply replace the broken > disk without involing Gluster at all. However, this also means facing > the "iops wall" I described earlier for single-brick node. Going > full-Guster with JBODs would be interesting from a performance > standpoint, but this complicate eventual recovery from bad disks. > > Does someone use Gluster in JBOD mode? If so, can you share your > experience? > Thanks.Hi, we once used gluster with disks in JBOD mode (3 servers, 4x10TB hdd each, 4 x 3 = 12), and to make it short: in our special case it wasn't that funny. Big HDDs, lots of small files, (highly) concurrent access through our application. It was running quite fine, until a disk failed. The reset-disk took ~30 (!) days, as you have gluster copying/restoring the data and the normal application read/write. After the first reset had finished, a couple of days later another disk died, and the fun started again :-) Maybe a bad use case. With this experience, the next setup was: splitting data into 2 chunks (high I/O, low I/O), 3 servers with 2 raid10 (same type of disk), each raid used as a brick, resulting in replicate 3: 1 x 3 = 3. Changing a failed disk now results in a complete raid resync, but regarding I/O this is far better than using reset-disk with a HDD only. Only the regularly running raid check was a bit of a performance issue. Latest setup (for the high I/O part) looks like this: 3 servers, 10 disks with 10TB each -> 5 raid1, forming a distribute replicate with 5 bricks, 5 x 3 = 15. No disk has failed so far (fingers crossed), but if now a disk fails, gluster is still running with all bricks available, and after changing the failed, there's one raid resync running, affecting only 1/5 of the volume. In theory that should be better ;-) The regularly running raid checks are no problem so far, for 15 raid1 only 1 is running, none parallel. disclaimer: JBOD may work better with SSDs/NVMes - untested ;-) Best regards, Hubert
Hi Hubert, keep in mind RH recommends disks of size 2-3 TB, not 10. I guess that has changed the situation. For NVMe/SSD - raid controller is pointless , so JBOD makes most sense. Best Regards, Strahil Nikolov ?? 22 ??? 2020 ?. 7:58:56 GMT+03:00, Hu Bert <revirii at googlemail.com> ??????:>Am So., 21. Juni 2020 um 19:43 Uhr schrieb Gionatan Danti ><g.danti at assyoma.it>: > >> For the RAID6/10 setup, I found no issues: simply replace the broken >> disk without involing Gluster at all. However, this also means facing >> the "iops wall" I described earlier for single-brick node. Going >> full-Guster with JBODs would be interesting from a performance >> standpoint, but this complicate eventual recovery from bad disks. >> >> Does someone use Gluster in JBOD mode? If so, can you share your >> experience? >> Thanks. > >Hi, >we once used gluster with disks in JBOD mode (3 servers, 4x10TB hdd >each, 4 x 3 = 12), and to make it short: in our special case it wasn't >that funny. Big HDDs, lots of small files, (highly) concurrent access >through our application. It was running quite fine, until a disk >failed. The reset-disk took ~30 (!) days, as you have gluster >copying/restoring the data and the normal application read/write. >After the first reset had finished, a couple of days later another >disk died, and the fun started again :-) Maybe a bad use case. > >With this experience, the next setup was: splitting data into 2 chunks >(high I/O, low I/O), 3 servers with 2 raid10 (same type of disk), each >raid used as a brick, resulting in replicate 3: 1 x 3 = 3. Changing a >failed disk now results in a complete raid resync, but regarding I/O >this is far better than using reset-disk with a HDD only. Only the >regularly running raid check was a bit of a performance issue. > >Latest setup (for the high I/O part) looks like this: 3 servers, 10 >disks with 10TB each -> 5 raid1, forming a distribute replicate with 5 >bricks, 5 x 3 = 15. No disk has failed so far (fingers crossed), but >if now a disk fails, gluster is still running with all bricks >available, and after changing the failed, there's one raid resync >running, affecting only 1/5 of the volume. In theory that should be >better ;-) The regularly running raid checks are no problem so far, >for 15 raid1 only 1 is running, none parallel. > >disclaimer: JBOD may work better with SSDs/NVMes - untested ;-) > > >Best regards, >Hubert
Il 2020-06-22 06:58 Hu Bert ha scritto:> Am So., 21. Juni 2020 um 19:43 Uhr schrieb Gionatan Danti > <g.danti at assyoma.it>: > >> For the RAID6/10 setup, I found no issues: simply replace the broken >> disk without involing Gluster at all. However, this also means facing >> the "iops wall" I described earlier for single-brick node. Going >> full-Guster with JBODs would be interesting from a performance >> standpoint, but this complicate eventual recovery from bad disks. >> >> Does someone use Gluster in JBOD mode? If so, can you share your >> experience? >> Thanks. > > Hi, > we once used gluster with disks in JBOD mode (3 servers, 4x10TB hdd > each, 4 x 3 = 12), and to make it short: in our special case it wasn't > that funny. Big HDDs, lots of small files, (highly) concurrent access > through our application. It was running quite fine, until a disk > failed. The reset-disk took ~30 (!) days, as you have gluster > copying/restoring the data and the normal application read/write. > After the first reset had finished, a couple of days later another > disk died, and the fun started again :-) Maybe a bad use case.Hi Hubert, this is the exact scenario which scares me if/when using JBOD. Maybe for virtual machine disks (ie: big files) it would be faster, but still...> Latest setup (for the high I/O part) looks like this: 3 servers, 10 > disks with 10TB each -> 5 raid1, forming a distribute replicate with 5 > bricks, 5 x 3 = 15. No disk has failed so far (fingers crossed), but > if now a disk fails, gluster is still running with all bricks > available, and after changing the failed, there's one raid resync > running, affecting only 1/5 of the volume. In theory that should be > better ;-) The regularly running raid checks are no problem so far, > for 15 raid1 only 1 is running, none parallel.Ok, so you multiplied the number of bricks by using multiple RAID1 array. Good idea, it should be fine in this manner.> disclaimer: JBOD may work better with SSDs/NVMes - untested ;-)Yeah, I think so! Regads. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it [1] email: g.danti at assyoma.it - info at assyoma.it GPG public key ID: FF5F32A8