Sam McLeod
2017-Sep-11  04:29 UTC
[Gluster-users] Gluster command performance - Setting volume options slow
Hello,
I'm currently trialling a deployment of GlusterFS (currently 3.10) for
our Kubernetes storage and am hitting performance problems with the
gluster command line interface used to provision and manage volumes.
When creating, modifying or setting options on volumes - gluster  takes
almost exactly 1 second to issue any command.
This means - if for example you have 60 volumes and 10 settings on each
volume it's going to take over 16 minutes  to create the volumes and set
the options on each volume.
Starting a volume with 'gluster volume start' takes 5 seconds.
I can't see any technical reason why setting options on volumes should
take any longer than writing the option to disk on the local node,
issuing the change to other nodes in the cluster and verifying the
success of the change, I would expect that this would only take a few
milliseconds as long as disk and network performance was sufficient.
- I'm wondering if there's some sort of sleep like behaviour within the
  gluster command?
Any assistance would be appreciated.
---
Cluster information:
- OS: CentOS 7
- Gluster: 3.10 (newest available to CentOS 7)
- Storage: SSD (40K random read/write IOPs, 300MB/s per node)
- Memory: 16GB per node
- Network: 10Gbit (switched, same VLAN)
- Data size: none, we haven't loaded data yet
- Cluster configured with replica 3, arbiter 1
Example volume configuration:
Status of volume: dashingdev_storage
Gluster process                             TCP Port  RDMA Port
Online  Pid--------------------------------------------------------------------
----------Brick int-gluster-01:/mn
t/gluster-storage/dashingdev_storage        49192     0
  Y       16798Brick int-gluster-02:/mn
t/gluster-storage/dashingdev_storage        49192     0
  Y       19886Brick int-gluster-03:/mn
t/gluster-storage/dashingdev_storage        49192     0
  Y       20450Self-heal Daemon on localhost               N/A       N/A
N       N/ASelf-heal Daemon on int-gluster-03
                                  N/A       N/A        Y       32695
Self-heal Daemon on int-gluster-02
                                  N/A       N/A        N       N/A
Task Status of Volume dashingdev_storage
--------------------------------------------------------------------
----------There are no active volume tasks
Volume Name: dashingdev_storage
Type: Replicate
Volume ID: 22937936-2e28-47a3-b65d-cfc9b2c0d069
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: int-gluster-01:/mnt/gluster-storage/dashingdev_storage
Brick2: int-gluster-02:/mnt/gluster-storage/dashingdev_storage
Brick3: int-gluster-03:/mnt/gluster-storage/dashingdev_storage
Options Reconfigured:
server.event-threads: 3
performance.write-behind-window-size: 1MB
performance.stat-prefetch: true
performance.readdir-ahead: true
performance.rda-cache-limit: 32MB
performance.parallel-readdir: true
performance.md-cache-timeout: 600
performance.io-thread-count: 8
performance.io-cache: true
performance.client-io-threads: true
performance.cache-size: 64MB
performance.cache-refresh-timeout: 4
performance.cache-invalidation: true
performance.cache-ima-xattrs: true
network.ping-timeout: 5
network.inode-lru-limit: 50000
features.cache-invalidation: true
features.cache-invalidation-timeout: 600
diagnostics.client-log-level: WARNING
diagnostics.brick-log-level: WARNING
cluster.readdir-optimize: true
cluster.lookup-optimize: true
client.event-threads: 3
transport.address-family: inet
nfs.disable: true
--
Sam McLeod
@s_mcleod[1] | smcleod.net
Links:
  1. https://twitter.com/s_mcleod
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170911/c3f35d13/attachment.html>
Possibly Parallel Threads
- 0-client_t: null client [Invalid argument] & high CPU usage (Gluster 3.12)
- Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)
- Gluster 3.13.0-1.el7 Packages Tested
- Changing performance.parallel-readdir to on causes CPU soft lockup and very high load all glusterd nodes
- Glusterd proccess hangs on reboot
