thr3ads.net - Gluster users - [Gluster-users] Replica 3 scale out and ZFS bricks [Sep 2020]

If this information is useful, please help other people find it:
Share via:

Alexander Iliev

2020-Sep-16 08:45 UTC

[Gluster-users] Replica 3 scale out and ZFS bricks

Hi list,

I am in the process of planning a 3-node replica 3 setup and I have a 
question about scaling it out.

 From what I understood, in order to be able to scale it one node at a 
time, I need to set up the initial nodes with a number of bricks that is 
a multiple of 3 (e.g., 3, 6, 9, etc. bricks). The initial cluster will 
be able to export a volume as large as the storage of a single node and 
adding one more node will grow the volume by 1/3 (assuming homogeneous 
nodes.)

Please let me know if my understanding is correct.

My plan is to use ZFS as the underlying system for the bricks. Now I'm 
wondering - if I join the disks on each node in a, say, RAIDZ2 pool and 
then create a dataset within the pool for each brick, the GlusterFS 
volume would report the volume size 3x$brick_size, because each brick 
shares the same pool and the size/free space is reported according to 
the ZFS pool size/free space.

How should I go about this? Should I create a ZFS pool per brick (this 
seems to have a negative impact on performance)? Should I set a quota 
for each dataset?

Does my plan even make sense?

Thank you!

Best regards,
-- 
alexander iliev

Strahil Nikolov

2020-Sep-16 19:53 UTC

head link

[Gluster-users] Replica 3 scale out and ZFS bricks

? ?????, 16 ????????? 2020 ?., 11:54:57 ???????+3, Alexander Iliev
<ailiev+gluster at mamul.org> ??????:

>From what I understood, in order to be able to scale it one node at a time, I need to set up the initial nodes with a number of bricks that is
a multiple of 3 (e.g., 3, 6, 9, etc. bricks). The initial cluster will
be able to export a volume as large as the storage of a single node and
adding one more node will grow the volume by 1/3 (assuming homogeneous
nodes.)

? ? You can't add 1 node to a replica 3, so no - you won't get 1/3 with
that extra node.

My plan is to use ZFS as the underlying system for the bricks. Now I'm
wondering - if I join the disks on each node in a, say, RAIDZ2 pool and
then create a dataset within the pool for each brick, the GlusterFS
volume would report the volume size 3x$brick_size, because each brick
shares the same pool and the size/free space is reported according to
the ZFS pool size/free space.

I'm not sure about ZFS (never played with it on Linux), but in my systems I
setup a Thinpool consisting on all HDDs in a striped way (when no Hardware Raid
Controller is available) and then you setup thin LVs for each brick.
In thin LVM you can define Virtual Size and this size is reported as the volume
size (assuming that all bricks are the same in size).If you have 1 RAIDZ2 pool
per Gluster TSP node, then that pool's size is the maximum size of your
volume. If you plan to use snapshots , then you should set quota on the volume
to control the usage.?

How should I go about this? Should I create a ZFS pool per brick (this
seems to have a negative impact on performance)? Should I set a quota
for each dataset?

I would go with 1 RAIDZ2 pool with 1 dataset of type 'filesystem'?per
Gluster node . Quota is always good to have.

P.S.: Any reason to use ZFS ? It uses a lot of memory .

Best Regards,
Strahil Nikolov

Gluster users - Sep 2020 - Replica 3 scale out and ZFS bricks

[Gluster-users] Replica 3 scale out and ZFS bricks

[Gluster-users] Replica 3 scale out and ZFS bricks