thr3ads.net - Gluster users - [Gluster-users] gluster for home directories? [Mar 2018]

If this information is useful, please help other people find it:
Share via:

Rik Theys

2018-Mar-07 14:59 UTC

[Gluster-users] gluster for home directories?

Hi,

We are looking into replacing our current storage solution and are
evaluating gluster for this purpose. Our current solution uses a SAN
with two servers attached that serve samba and NFS 4. Clients connect to
those servers using NFS or SMB. All users' home directories live on this
server.

I would like to have some insight in who else is using gluster for home
directories for about 500 users and what performance they get out of the
solution. Which connectivity method are you using on the clients
(gluster native, nfs, smb)? Which volume options do you have configured
for your gluster volume? What hardware are you using? Are you using
snapshots and/or quota? If so, any number on performance impact?

The solution I had in mind for our setup is multiple servers/bricks with
replica 3 arbiter 1 volume where each server is also running nfs-ganesha
and samba in HA. Clients would be connecting to one of the nfs servers
(dns round robin). In this case the nfs servers would be the gluster
clients. Gluster traffic would go over a dedicated network with 10G and
jumbo frames.

I'm currently testing gluster (3.12, now 3.13) on older machines[1] and
have created a replica 3 arbiter 1 volume 2x(2+1). I seem to run in all
sorts of (performance) problems. I must be doing something wrong but
I've tried all sorts of benchmarks and nothing seems to make my setup
live up to what I would expect from this hardware.

* I understand that gluster only starts to work well when multiple
clients are connecting in parallel, but I did expect the single client
performance to be better.

* Unpacking the linux-4.15.7.tar.xz file on the brick XFS filesystem
followed by a sync takes about 1 minute. Doing the same on the gluster
volume using the fuse client (client is one of the brick servers) takes
over 9 minutes and neither disk nor cpu nor network are reaching their
bottleneck. Doing the same over NFS-ganesha (client is a workstation
connected through gbit) takes even longer (more than 30min!?).

I understand that unpacking a lot of small files may be the worst
workload for a distributed filesystem, but when I look at the file sizes
of the files in our users' home directories, more than 90% is smaller
than 1MB.

* A file copy of a 300GB file over NFS 4 (nfs-ganesha) starts fast
(90MB/s) and then drops to 20MB/s. When I look at the servers during the
copy, I don't see where the bottleneck is as the cpu, disk and network
are not maxing out (on none of the bricks). When the same client copies
the file to our current NFS storage it is limited by the gbit network
connection of the client.

* I had the 'cluster.optimize-lookup' option enabled but ran into all
sorts of issues where ls is showing either the wrong files (content of a
different directory), or claiming a directory does not exist when mkdir
says it already exists... I current have the following options set:

server.outstanding-rpc-limit: 256
client.event-threads: 4
performance.io-thread-count: 16
performance.parallel-readdir: on
server.event-threads: 4
performance.cache-size: 2GB
performance.rda-cache-limit: 128MB
performance.write-behind-window-size: 8MB
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
network.inode-lru-limit: 500000
performance.nl-cache-timeout: 600
performance.nl-cache: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
transport.address-family: inet
nfs.disable: on
cluster.enable-shared-storage: enable

The brick servers have 2 dual-core cpu's so I've set the client and
server event threads to 4.

* When using nfs-ganesha I run into bugs that makes me wonder who is
using nfs-ganesha with gluster and why are they not hitting these bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1543996
https://bugzilla.redhat.com/show_bug.cgi?id=1405147

* nfs-ganesha does not have the 'async' option that kernel nfs has. I
can understand why they don't want to implement this feature, but do
wonder how others are increasing their nfs-ganesha performance. I've put
some SSD's in each brick and have them configured as lvmcache to the
bricks. This setup only increases throughput once the data is on the ssd
and not for just-written data.

Regards,

Rik

[1] 4 servers with 2 1Gbit nics (one for the client traffic, one for s2s
traffic with jumbo frames enabled). Each server has two disks (bricks).

[2] ioping from the nfs client shows the following latencies:
min/avg/max/mdev = 695.2 us / 2.17 ms / 7.05 ms / 1.92 ms

ping rtt from client to nfs-ganesha server:
rtt min/avg/max/mdev = 0.106/1.551/6.195/2.098 ms

ioping on the volume fuse mounted from a brick:
min/avg/max/mdev = 557.0 us / 824.4 us / 2.68 ms / 421.9 us

ioping on the brick xfs filesystem:
min/avg/max/mdev = 275.2 us / 515.2 us / 12.4 ms / 1.21 ms

Are these normal numbers?

Ondrej Valousek

2018-Mar-07 15:35 UTC

head link

[Gluster-users] gluster for home directories?

Hi,
Why do you need to replace your existing solution?
If you don't need to scale out due to the capacity reasons, the async NFS
server will always outperform GlusterFS

Ondrej
-----

The information contained in this e-mail and in any attachments is confidential
and is designated solely for the attention of the intended recipient(s). If you
are not an intended recipient, you must not use, disclose, copy, distribute or
retain this e-mail or any part thereof. If you have received this e-mail in
error, please notify the sender by return e-mail and delete all copies of this
e-mail from your computer system(s). Please direct any additional queries to:
communications at s3group.com. Thank You. Silicon and Software Systems Limited
(S3 Group). Registered in Ireland no. 378073. Registered Office: South County
Business Park, Leopardstown, Dublin 18.

Rik Theys

2018-Mar-07 18:38 UTC

head link

[Gluster-users] gluster for home directories?

Hi,

On 2018-03-07 16:35, Ondrej Valousek wrote:> Why do you need to replace your existing solution?
> If you don't need to scale out due to the capacity reasons, the async
> NFS server will always outperform GlusterFS
The current solution is 8 years old and is reaching its end of life.

The reason we are also looking into gluster is that we like that it uses 
standard components and that we can prevent forklift upgrades every 
5,6,7 years by replacing a few bricks each year. Next to providing 
storage for home directories we would like to also use the hosts to run 
VM's in a hyperconverged setup (with their storage as an additional 
gluster volume on those bricks).

Regards,

Rik

-- 
Rik Theys
System Engineer
KU Leuven - Dept. Elektrotechniek (ESAT)
Kasteelpark Arenberg 10 bus 2440  - B-3001 Leuven-Heverlee
+32(0)16/32.11.07
----------------------------------------------------------------
<<Any errors in spelling, tact or fact are transmission errors>>

Manoj Pillai

2018-Mar-08 09:52 UTC

head link

[Gluster-users] gluster for home directories?

Hi Rik,

Nice clarity and detail in the description. Thanks!

inline...

On Wed, Mar 7, 2018 at 8:29 PM, Rik Theys <Rik.Theys at esat.kuleuven.be>
wrote:
> Hi,
>
> We are looking into replacing our current storage solution and are
> evaluating gluster for this purpose. Our current solution uses a SAN
> with two servers attached that serve samba and NFS 4. Clients connect to
> those servers using NFS or SMB. All users' home directories live on
this
> server.
>
> I would like to have some insight in who else is using gluster for home
> directories for about 500 users and what performance they get out of the
> solution. Which connectivity method are you using on the clients
> (gluster native, nfs, smb)? Which volume options do you have configured
> for your gluster volume? What hardware are you using? Are you using
> snapshots and/or quota? If so, any number on performance impact?
>
> The solution I had in mind for our setup is multiple servers/bricks with
> replica 3 arbiter 1 volume where each server is also running nfs-ganesha
> and samba in HA. Clients would be connecting to one of the nfs servers
> (dns round robin). In this case the nfs servers would be the gluster
> clients. Gluster traffic would go over a dedicated network with 10G and
> jumbo frames.
>
> I'm currently testing gluster (3.12, now 3.13) on older machines[1] and
> have created a replica 3 arbiter 1 volume 2x(2+1). I seem to run in all
> sorts of (performance) problems. I must be doing something wrong but
> I've tried all sorts of benchmarks and nothing seems to make my setup
> live up to what I would expect from this hardware.
>
> * I understand that gluster only starts to work well when multiple
> clients are connecting in parallel, but I did expect the single client
> performance to be better.
>
> * Unpacking the linux-4.15.7.tar.xz file on the brick XFS filesystem
> followed by a sync takes about 1 minute. Doing the same on the gluster
> volume using the fuse client (client is one of the brick servers) takes
> over 9 minutes and neither disk nor cpu nor network are reaching their
> bottleneck. Doing the same over NFS-ganesha (client is a workstation
> connected through gbit) takes even longer (more than 30min!?).
>
> I understand that unpacking a lot of small files may be the worst
> workload for a distributed filesystem, but when I look at the file sizes
> of the files in our users' home directories, more than 90% is smaller
> than 1MB.
>
> * A file copy of a 300GB file over NFS 4 (nfs-ganesha) starts fast
> (90MB/s) and then drops to 20MB/s. When I look at the servers during the
> copy, I don't see where the bottleneck is as the cpu, disk and network
> are not maxing out (on none of the bricks). When the same client copies
> the file to our current NFS storage it is limited by the gbit network
> connection of the client.
>
Both untar and cp are single-threaded, which means throughput is mostly
dictated by latency. Latency is generally higher in a distributed FS;
nfs-ganesha has an extra hop to the backend, and hence higher latency for
most operations compared to glusterfs-fuse.

You don't necessarily need multiple clients for good performance with
gluster. Many multi-threaded benchmarks give good performance from a single
client. Here for e.g., if you run multiple copy commands in parallel from
the same client, I'd expect your aggregate transfer rate to improve.

Been a long while since I looked at nfs-ganesha. But in terms of upper
bounds for throughput tests: data needs to flow over the client->nfs-server
link, and then, depending on which servers the file is located on, either
1x (if the nfs-ganesha node is also hosting one copy of the file, and
neglecting arbiter) or 2x over the s2s link. With 1Gbps links, that means
an upper bound between 125 MB/s and 62.5 MB/s, in the steady state, unless
I miscalculated.

-- Manoj

>
> * I had the 'cluster.optimize-lookup' option enabled but ran into
all
> sorts of issues where ls is showing either the wrong files (content of a
> different directory), or claiming a directory does not exist when mkdir
> says it already exists... I current have the following options set:
>
> server.outstanding-rpc-limit: 256
> client.event-threads: 4
> performance.io-thread-count: 16
> performance.parallel-readdir: on
> server.event-threads: 4
> performance.cache-size: 2GB
> performance.rda-cache-limit: 128MB
> performance.write-behind-window-size: 8MB
> performance.md-cache-timeout: 600
> performance.cache-invalidation: on
> performance.stat-prefetch: on
> network.inode-lru-limit: 500000
> performance.nl-cache-timeout: 600
> performance.nl-cache: on
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: on
> transport.address-family: inet
> nfs.disable: on
> cluster.enable-shared-storage: enable
>
> The brick servers have 2 dual-core cpu's so I've set the client and
> server event threads to 4.
>
> * When using nfs-ganesha I run into bugs that makes me wonder who is
> using nfs-ganesha with gluster and why are they not hitting these bugs:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1543996
> https://bugzilla.redhat.com/show_bug.cgi?id=1405147
>
> * nfs-ganesha does not have the 'async' option that kernel nfs has.
I
> can understand why they don't want to implement this feature, but do
> wonder how others are increasing their nfs-ganesha performance. I've
put
> some SSD's in each brick and have them configured as lvmcache to the
> bricks. This setup only increases throughput once the data is on the ssd
> and not for just-written data.
>
> Regards,
>
> Rik
>
> [1] 4 servers with 2 1Gbit nics (one for the client traffic, one for s2s
> traffic with jumbo frames enabled). Each server has two disks (bricks).
>
> [2] ioping from the nfs client shows the following latencies:
> min/avg/max/mdev = 695.2 us / 2.17 ms / 7.05 ms / 1.92 ms
>
> ping rtt from client to nfs-ganesha server:
> rtt min/avg/max/mdev = 0.106/1.551/6.195/2.098 ms
>
> ioping on the volume fuse mounted from a brick:
> min/avg/max/mdev = 557.0 us / 824.4 us / 2.68 ms / 421.9 us
>
> ioping on the brick xfs filesystem:
> min/avg/max/mdev = 275.2 us / 515.2 us / 12.4 ms / 1.21 ms
>
> Are these normal numbers?
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180308/433cf0a3/attachment.html>

Rik Theys

2018-Mar-08 10:18 UTC

head link

[Gluster-users] gluster for home directories?

Hi,

On 03/08/2018 10:52 AM, Manoj Pillai wrote:> On Wed, Mar 7, 2018 at 8:29 PM, Rik Theys <Rik.Theys at esat.kuleuven.be
> <mailto:Rik.Theys at esat.kuleuven.be>> wrote:
> 
>     I'm currently testing gluster (3.12, now 3.13) on older machines[1]
and
>     have created a replica 3 arbiter 1 volume 2x(2+1). I seem to run in all
>     sorts of (performance) problems. I must be doing something wrong but
>     I've tried all sorts of benchmarks and nothing seems to make my
setup
>     live up to what I would expect from this hardware.
> 
>     * I understand that gluster only starts to work well when multiple
>     clients are connecting in parallel, but I did expect the single client
>     performance to be better.
> 
>     * Unpacking the linux-4.15.7.tar.xz file on the brick XFS filesystem
>     followed by a sync takes about 1 minute. Doing the same on the gluster
>     volume using the fuse client (client is one of the brick servers) takes
>     over 9 minutes and neither disk nor cpu nor network are reaching their
>     bottleneck. Doing the same over NFS-ganesha (client is a workstation
>     connected through gbit) takes even longer (more than 30min!?).
> 
>     I understand that unpacking a lot of small files may be the worst
>     workload for a distributed filesystem, but when I look at the file
sizes
>     of the files in our users' home directories, more than 90% is
smaller
>     than 1MB.
> 
>     * A file copy of a 300GB file over NFS 4 (nfs-ganesha) starts fast
>     (90MB/s) and then drops to 20MB/s. When I look at the servers during
the
>     copy, I don't see where the bottleneck is as the cpu, disk and
network
>     are not maxing out (on none of the bricks). When the same client copies
>     the file to our current NFS storage it is limited by the gbit network
>     connection of the client.
> 
> 
> Both untar and cp are single-threaded, which means throughput is mostly
> dictated by latency. Latency is generally higher in a distributed FS;
> nfs-ganesha has an extra hop to the backend, and hence higher latency
> for most operations compared to glusterfs-fuse.
> 
> You don't necessarily need multiple clients for good performance with
> gluster. Many multi-threaded benchmarks give good performance from a
> single client. Here for e.g., if you run multiple copy commands in
> parallel from the same client, I'd expect your aggregate transfer rate
> to improve.
> 
> Been a long while since I looked at nfs-ganesha. But in terms of upper
> bounds for throughput tests: data needs to flow over the
> client->nfs-server link, and then, depending on which servers the file
> is located on, either 1x (if the nfs-ganesha node is also hosting one
> copy of the file, and neglecting arbiter) or 2x over the s2s link. With
> 1Gbps links, that means an upper bound between 125 MB/s and 62.5 MB/s,
> in the steady state, unless I miscalculated.
Yes, you are correct, but the speeds I'm seeing are far below 62.5MB/s.

In the untar case, I fully understand the overhead as there a lot of
small files and therefore a lot of metadata overhead.

In the sequential write the speed should be much better as latency
is/should be less of an issue here?

I've been trying to find some documentation on nfs-ganesha but
everything I find seems to be outdated :-(.

The documentation on their wiki states:

"
Version 2.0

This version is in active development and is not considered stable
enough for production use. Its documentation is still incomplete.
"

Their latest version is 2.6.0...

Also I can not find what changed between 2.5 and 2.6. Sure I can look at
he git commits, but there is no maintained changes/changelog,...
>From what I've read nfs-ganesha should be able cache a lot of data, butI can't find any good documentation on how to configure this.

Regards,

Rik

-- 
Rik Theys
System Engineer
KU Leuven - Dept. Elektrotechniek (ESAT)
Kasteelpark Arenberg 10 bus 2440  - B-3001 Leuven-Heverlee
+32(0)16/32.11.07
----------------------------------------------------------------
<<Any errors in spelling, tact or fact are transmission errors>>

Possibly Parallel Threads

Search for more possibly parallel threads

Gluster users - Mar 2018 - gluster for home directories?

[Gluster-users] gluster for home directories?

[Gluster-users] gluster for home directories?

[Gluster-users] gluster for home directories?

[Gluster-users] gluster for home directories?

[Gluster-users] gluster for home directories?

Possibly Parallel Threads