thr3ads.net - Gluster users - [Gluster-users] GlusterFS on ZFS [Apr 2019]

If this information is useful, please help other people find it:
Share via:

Cody Hill

2019-Apr-16 22:09 UTC

[Gluster-users] GlusterFS on ZFS

Hey folks.

I?m looking to deploy GlusterFS to host some VMs. I?ve done a lot of reading and
would like to implement Deduplication and Compression in this setup. My thought
would be to run ZFS to handle the Compression and Deduplication.

ZFS would give me the following benefits:
1. If a single disk fails rebuilds happen locally instead of over the network
2. Zil & L2Arc should add a slight performance increase
3. Deduplication and Compression are inline and have pretty good performance
with modern hardware (Intel Skylake)
4. Automated Snapshotting

I can then layer GlusterFS on top to handle distribution to allow 3x Replicas of
my storage.
My question is? Why aren?t more people doing this? Is this a horrible idea for
some reason that I?m missing? I?d be very interested to hear your thoughts.

Additional thoughts:
I?d like to use Ganesha pNFS to connect to this storage. (Any issues here?)
I think I?d need KeepAliveD across these 3x nodes to store in the FSTAB (Is this
correct?)
I?m also thinking about creating a ?Gluster Tier? of 512GB of Intel Optane DIMM
to really smooth out write latencies? Any issues here?

Thank you,
Cody Hill

Pascal Suter

2019-Apr-17 15:34 UTC

head link

[Gluster-users] GlusterFS on ZFS

Hi Cody

i'm still new to Gluster myself, so take my input with the necessary 
skepticism:

if you care about performance (and it looks like you do), use zfs mirror 
pairs and not raidz volumes. in my experience (outside of gluster), 
raidz pools perform significantly worse than a hardware raid5 or 6. if 
you combine a mirror on zfs with a 3x replication on gluster, you need 
6x the amount of raw disk space to get your desired redundancy.. you 
could do with 3x the amount of diskspace, if you left the zfs mirror 
away and accept the rebuild of a lost disk over the network or you could 
end up somewhere beween 3x and 6x if you used hardware raid6 instead of 
zfs on the bricks. When using hardware raid6 make sure you align your 
lvm volumes properly, it makes a huge difference in performance. Okay, 
deduplication might give you some of it back, but benchmark the zfs 
deduplication process first before deciding on it. in theory it could 
add to your write perofrmance, but i'm not sure if that's going to 
happen in reality.

snapshotting might be tricky.. afaik gluster natively supports 
snapshotting with thin provisioned lvm volumes only. this lets you 
create snapshots with the "gluster" cli tool. gluster will then handle
consistency across all your bricks so that each snapshot (as a whole, 
across all bricks) is consistent in itself. this includes some 
challenges about handling open file sessions etc. I'm not familiar with 
what gluster actually does but by reading the documentation and some 
discussion about snapshots it seems that there is more to it than simply 
automate a couple of lvcreate statements. so i would expect some 
challenges when doing it yourself on zfs rather than letting gluster 
handle it. Restoring a single file from a snapshot also seems alot 
easier if you go with the lvm thin setup.. you can then mount a snapshot 
(of your entire gluster volume, not just of a brick) and simply copy the 
file.. while with zfs it seems you need to find out which bricks your 
file resided on, then copy the necessary raw data to your live bricks 
which is something i would not feel comfortable doing and it is a lot 
more work and prone to error.

also, if things go wrong (for example when dealing with the snapshots), 
there are probably not so many people around to help you.

again, i am no expert, that's just what i'd be concerned about with the 
little knowledge i have at the moment :)

cheers

Pascal

On 17.04.19 00:09, Cody Hill wrote:> Hey folks.
>
> I?m looking to deploy GlusterFS to host some VMs. I?ve done a lot of
reading and would like to implement Deduplication and Compression in this setup.
My thought would be to run ZFS to handle the Compression and Deduplication.
>
> ZFS would give me the following benefits:
> 1. If a single disk fails rebuilds happen locally instead of over the
network
> 2. Zil & L2Arc should add a slight performance increase
> 3. Deduplication and Compression are inline and have pretty good
performance with modern hardware (Intel Skylake)
> 4. Automated Snapshotting
>
> I can then layer GlusterFS on top to handle distribution to allow 3x
Replicas of my storage.
> My question is? Why aren?t more people doing this? Is this a horrible idea
for some reason that I?m missing? I?d be very interested to hear your thoughts.
>
> Additional thoughts:
> I?d like to use Ganesha pNFS to connect to this storage. (Any issues here?)
> I think I?d need KeepAliveD across these 3x nodes to store in the FSTAB (Is
this correct?)
> I?m also thinking about creating a ?Gluster Tier? of 512GB of Intel Optane
DIMM to really smooth out write latencies? Any issues here?
>
> Thank you,
> Cody Hill
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

Darrell Budic

2019-Apr-18 16:19 UTC

head link

[Gluster-users] GlusterFS on ZFS

I use ZFS over VOD because I?m more familiar with it and it suites my use case
better. I got similar results from performance tests, with VOD outperforming
writes slight and ZFS outperforming reads. That was before I added some ZIL and
cache to my ZFS disks, too. I also don?t like that you have to specify estimated
sizes with VOD for compression, I prefer the ZFS approach. Don?t forget to set
the appropriate zfs attributes, the parts of the Gluster doc with those are
still valid.

Few more comments inline:
> On Apr 16, 2019, at 5:09 PM, Cody Hill <cody at platform9.com> wrote:
> 
> Hey folks.
> 
> I?m looking to deploy GlusterFS to host some VMs. I?ve done a lot of
reading and would like to implement Deduplication and Compression in this setup.
My thought would be to run ZFS to handle the Compression and Deduplication.
> 
> ZFS would give me the following benefits:
> 1. If a single disk fails rebuilds happen locally instead of over the
network
I actually run mine in a pure stripe for best performance, if a disk fails and
smart warnings didn?t give me enough time to replace it inline first, I?ll
rebuild over the network. I have 10G of course, and currently < 10TB of data
so I consider it reasonable. I also decided I?d rather present one large brick
over many smaller bricks, in some tests others have done, it has shown benefits
for gluster healing.
> 2. Zil & L2Arc should add a slight performance increase
Yes. Get the absolute fasted ZIL you can, but any modern enterprise SSD will
still give you some benefits. Over-provision these, you probably need 4-15Gb for
the Zil (1G networking vs 10G), and I use 90% of the cache drive to allow the
SSD to work it?s best. Cache effectiveness depends on your workload, so monitor
and/or test with/without.
> 3. Deduplication and Compression are inline and have pretty good
performance with modern hardware (Intel Skylake)
LZ4 compression is great. As others have said, I?d avoid deduplication
altogether. Especially in a gluster environment, why waste the RAM and do the
work multiple times?
> 4. Automated Snapshotting
Be careful doing this ?underneath? the gluster layer, you?re snapshotting only
that replica and it?s not guaranteed to be in sync with the others. At best,
you?re making a point in time backup of one node, maybe useful for off-system
backups with zfs streaming, but I?d consider gluster geo-rep first. And won?t
work at all if you are not running a pure replica.
> I can then layer GlusterFS on top to handle distribution to allow 3x
Replicas of my storage.
> My question is? Why aren?t more people doing this? Is this a horrible idea
for some reason that I?m missing? I?d be very interested to hear your thoughts.
> 
> Additional thoughts:
> I?d like to use Ganesha pNFS to connect to this storage. (Any issues here?)
I?d just use glusterfs glfsapi mounts, but if you want to go NFS, sure. Make
sure you?re ready to support Ganesha, it doesn?t seem to be as well integrated
in the latest gluster releases. Caveat, I don?t use it myself.
> I think I?d need KeepAliveD across these 3x nodes to store in the FSTAB (Is
this correct?)
There are easier ways. I use a simple DNS round robin to a name (that i can put
in the host files for the servers/clients to avoid bootstrap issues when the
local DNS is a vm ;)), and set the backup-server option so nodes can switch
automatically if one fails. Or you can mount localhost: with a converged
cluster, again with backup-server options for best results.
> I?m also thinking about creating a ?Gluster Tier? of 512GB of Intel Optane
DIMM to really smooth out write latencies? Any issues here?
Gluster tiering is currently being dropped from support, until/unless it comes
back, I?d use the optanes as cache/zil or just make a separate fast pool out of
them.

Dave Pedu

2019-Apr-19 00:06 UTC

head link

[Gluster-users] GlusterFS on ZFS

Do check this doc:

https://docs.gluster.org/en/latest/Administrator%20Guide/Gluster%20On%20ZFS/#build-install-zfs

In particular, the bit regarding xattr=sa. In the past, Gluster would 
cause extremely poor performance on zfs datasets without this option 
set. I'm not sure if this is still the case.

- Dave

On 2019-04-16 15:09, Cody Hill wrote:> Hey folks.
> 
> I?m looking to deploy GlusterFS to host some VMs. I?ve done a lot of
> reading and would like to implement Deduplication and Compression in
> this setup. My thought would be to run ZFS to handle the Compression
> and Deduplication.
> 
> ZFS would give me the following benefits:
> 1. If a single disk fails rebuilds happen locally instead of over the 
> network
> 2. Zil & L2Arc should add a slight performance increase
> 3. Deduplication and Compression are inline and have pretty good
> performance with modern hardware (Intel Skylake)
> 4. Automated Snapshotting
> 
> I can then layer GlusterFS on top to handle distribution to allow 3x
> Replicas of my storage.
> My question is? Why aren?t more people doing this? Is this a horrible
> idea for some reason that I?m missing? I?d be very interested to hear
> your thoughts.
> 
> Additional thoughts:
> I?d like to use Ganesha pNFS to connect to this storage. (Any issues 
> here?)
> I think I?d need KeepAliveD across these 3x nodes to store in the
> FSTAB (Is this correct?)
> I?m also thinking about creating a ?Gluster Tier? of 512GB of Intel
> Optane DIMM to really smooth out write latencies? Any issues here?
> 
> Thank you,
> Cody Hill
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

Gluster users - Apr 2019 - GlusterFS on ZFS

[Gluster-users] GlusterFS on ZFS

[Gluster-users] GlusterFS on ZFS

[Gluster-users] GlusterFS on ZFS

[Gluster-users] GlusterFS on ZFS