Hi David, On Dec 24, 2019 02:47, David Cunningham <dcunningham at voisonics.com> wrote:> > Hello, > > In testing we found that actually the GFS client having access to all 3 nodes made no difference to performance. Perhaps that's because the 3rd node that wasn't accessible from the client before was the arbiter node?It makes sense, as no data is being generated towards the arbiter.> Presumably we shouldn't have an arbiter node listed under backupvolfile-server when mounting the filesystem? Since it doesn't store all the data surely it can't be used to serve the data.I have my arbiter defined as last backup and no issues so far. At least the admin can easily identify the bricks from the mount options.> We did have direct-io-mode=disable already as well, so that wasn't a factor in the performance problems.Have you checked if the client vedsion ia not too old. Also you can check the cluster's operation cersion: # gluster volume get all cluster.max-op-version # gluster volume get all cluster.op-version Cluster's op version should be at max-op-version. In my mind come 2 options: A) Upgrade to latest GLUSTER v6 or even v7 ( I know it won't be easy) and then set the op version to highest possible. # gluster volume get all cluster.max-op-version # gluster volume get all cluster.op-version B) Deploy a NFS Ganesha server and connect the client over NFS v4.2 (and control the parallel connections from Ganesha). Can you provide your Gluster volume's options? 'gluster volume get <VOLNAME> all'> Thanks again for any advice. > > > > On Mon, 23 Dec 2019 at 13:09, David Cunningham <dcunningham at voisonics.com> wrote: >> >> Hi Strahil, >> >> Thanks for that. We do have one backup server specified, but will add the second backup as well. >> >> >> On Sat, 21 Dec 2019 at 11:26, Strahil <hunter86_bg at yahoo.com> wrote: >>> >>> Hi David, >>> >>> Also consider using the? mount option to specify backup server via 'backupvolfile-server=server2:server3' (you can define more but I don't thing replica volumes? greater that 3 are usefull (maybe? in some special cases). >>> >>> In such way, when the primary is lost, your client can reach a backup one without disruption. >>> >>> P.S.: Client may 'hang' - if the primary server got rebooted ungracefully - as the communication must timeout before FUSE addresses the next server. There is a special script for? killing gluster processes in '/usr/share/gluster/scripts' which can be used? for? setting up a systemd service to do that for you on shutdown. >>> >>> Best Regards, >>> Strahil Nikolov >>> >>> On Dec 20, 2019 23:49, David Cunningham <dcunningham at voisonics.com> wrote: >>>> >>>> Hi Stahil, >>>> >>>> Ah, that is an important point. One of the nodes is not accessible from the client, and we assumed that it only needed to reach the GFS node that was mounted so didn't think anything of it. >>>> >>>> We will try making all nodes accessible, as well as "direct-io-mode=disable". >>>> >>>> Thank you. >>>> >>>> >>>> On Sat, 21 Dec 2019 at 10:29, Strahil Nikolov <hunter86_bg at yahoo.com> wrote: >>>>> >>>>> Actually I haven't clarified myself. >>>>> FUSE mounts on the client side is connecting directly to all bricks consisted of the volume. >>>>> If for some reason (bad routing, firewall blocked) there could be cases where the client can reach 2 out of 3 bricks and this can constantly cause healing to happen (as one of the bricks is never updated) which will degrade the performance and cause excessive network usage. >>>>> As your attachment is from one of the gluster nodes, this could be the case. >>>>> >>>>> Best Regards, >>>>> Strahil Nikolov >>>>> >>>>> ? ?????, 20 ???????? 2019 ?., 01:49:56 ?. ???????+2, David Cunningham <dcunningham at voisonics.com> ??????: >>>>> >>>>> >>>>> Hi Strahil, >>>>> >>>>> The chart attached to my original email is taken from the GFS server. >>>>> >>>>> I'm not sure what you mean by accessing all bricks simultaneously. We've mounted it from the client like this: >>>>> gfs1:/gvol0 /mnt/glusterfs/ glusterfs defaults,direct-io-mode=disable,_netdev,backupvolfile-server=gfs2,fetch-attempts=10 0 0 >>>>> >>>>> Should we do something different to access all bricks simultaneously? >>>>> >>>>> Thanks for your help! >>>>> >>>>> >>>>> On Fri, 20 Dec 2019 at 11:47, Strahil Nikolov <hunter86_bg at yahoo.com> wrote: >>>>>> >>>>>> I'm not sure if you did measure the traffic from client side (tcpdump on a client machine) or from Server side. >>>>>> >>>>>> In both cases , please verify that the client accesses all bricks simultaneously, as this can cause unnecessary heals. >>>>>> >>>>>> Have you thought about upgrading to v6? There are some enhancements in v6 which could be beneficial. >>>>>> >>>>>> Yet, it is indeed strange that so much traffic is generated with FUSE. >>>>>> >>>>>> Another aproach is to test with NFSGanesha which suports pNFS and can natively speak with Gluster, which cant bring you closer to the previous setup and also provide some extra performance. >>>>>> >>>>>> >>>>>> Best Regards, >>>>>> Strahil Nikolov >>>>>> >>>>>> >>>>>> >> >> >> -- >> David Cunningham, Voisonics Limited >> http://voisonics.com/ >> USA: +1 213 221 1092 >> New Zealand: +64 (0)28 2558 3782 > > > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782Best Regards, Strahil Nikolov -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20191224/4be97e1b/attachment-0001.html>
David Cunningham
2019-Dec-27  01:22 UTC
[Gluster-users] GFS performance under heavy traffic
Hi Strahil,
Our volume options are as below. Thanks for the suggestion to upgrade to
version 6 or 7. We could do that be simply removing the current
installation and installing the new one (since it's not live right now). We
might have to convince the customer that it's likely to succeed though, as
at the moment I think they believe that GFS is not going to work for them.
Option                                  Value
------                                  -----
cluster.lookup-unhashed                 on
cluster.lookup-optimize                 on
cluster.min-free-disk                   10%
cluster.min-free-inodes                 5%
cluster.rebalance-stats                 off
cluster.subvols-per-directory           (null)
cluster.readdir-optimize                off
cluster.rsync-hash-regex                (null)
cluster.extra-hash-regex                (null)
cluster.dht-xattr-name                  trusted.glusterfs.dht
cluster.randomize-hash-range-by-gfid    off
cluster.rebal-throttle                  normal
cluster.lock-migration                  off
cluster.force-migration                 off
cluster.local-volume-name               (null)
cluster.weighted-rebalance              on
cluster.switch-pattern                  (null)
cluster.entry-change-log                on
cluster.read-subvolume                  (null)
cluster.read-subvolume-index            -1
cluster.read-hash-mode                  1
cluster.background-self-heal-count      8
cluster.metadata-self-heal              on
cluster.data-self-heal                  on
cluster.entry-self-heal                 on
cluster.self-heal-daemon                on
cluster.heal-timeout                    600
cluster.self-heal-window-size           1
cluster.data-change-log                 on
cluster.metadata-change-log             on
cluster.data-self-heal-algorithm        (null)
cluster.eager-lock                      on
disperse.eager-lock                     on
disperse.other-eager-lock               on
disperse.eager-lock-timeout             1
disperse.other-eager-lock-timeout       1
cluster.quorum-type                     none
cluster.quorum-count                    (null)
cluster.choose-local                    true
cluster.self-heal-readdir-size          1KB
cluster.post-op-delay-secs              1
cluster.ensure-durability               on
cluster.consistent-metadata             no
cluster.heal-wait-queue-length          128
cluster.favorite-child-policy           none
cluster.full-lock                       yes
cluster.stripe-block-size               128KB
cluster.stripe-coalesce                 true
diagnostics.latency-measurement         off
diagnostics.dump-fd-stats               off
diagnostics.count-fop-hits              off
diagnostics.brick-log-level             INFO
diagnostics.client-log-level            INFO
diagnostics.brick-sys-log-level         CRITICAL
diagnostics.client-sys-log-level        CRITICAL
diagnostics.brick-logger                (null)
diagnostics.client-logger               (null)
diagnostics.brick-log-format            (null)
diagnostics.client-log-format           (null)
diagnostics.brick-log-buf-size          5
diagnostics.client-log-buf-size         5
diagnostics.brick-log-flush-timeout     120
diagnostics.client-log-flush-timeout    120
diagnostics.stats-dump-interval         0
diagnostics.fop-sample-interval         0
diagnostics.stats-dump-format           json
diagnostics.fop-sample-buf-size         65535
diagnostics.stats-dnscache-ttl-sec      86400
performance.cache-max-file-size         0
performance.cache-min-file-size         0
performance.cache-refresh-timeout       1
performance.cache-priority
performance.cache-size                  32MB
performance.io-thread-count             16
performance.high-prio-threads           16
performance.normal-prio-threads         16
performance.low-prio-threads            16
performance.least-prio-threads          1
performance.enable-least-priority       on
performance.iot-watchdog-secs           (null)
performance.iot-cleanup-disconnected-reqsoff
performance.iot-pass-through            false
performance.io-cache-pass-through       false
performance.cache-size                  128MB
performance.qr-cache-timeout            1
performance.cache-invalidation          false
performance.ctime-invalidation          false
performance.flush-behind                on
performance.nfs.flush-behind            on
performance.write-behind-window-size    1MB
performance.resync-failed-syncs-after-fsyncoff
performance.nfs.write-behind-window-size1MB
performance.strict-o-direct             off
performance.nfs.strict-o-direct         off
performance.strict-write-ordering       off
performance.nfs.strict-write-ordering   off
performance.write-behind-trickling-writeson
performance.aggregate-size              128KB
performance.nfs.write-behind-trickling-writeson
performance.lazy-open                   yes
performance.read-after-open             yes
performance.open-behind-pass-through    false
performance.read-ahead-page-count       4
performance.read-ahead-pass-through     false
performance.readdir-ahead-pass-through  false
performance.md-cache-pass-through       false
performance.md-cache-timeout            1
performance.cache-swift-metadata        true
performance.cache-samba-metadata        false
performance.cache-capability-xattrs     true
performance.cache-ima-xattrs            true
performance.md-cache-statfs             off
performance.xattr-cache-list
performance.nl-cache-pass-through       false
features.encryption                     off
encryption.master-key                   (null)
encryption.data-key-size                256
encryption.block-size                   4096
network.frame-timeout                   1800
network.ping-timeout                    42
network.tcp-window-size                 (null)
network.remote-dio                      disable
client.event-threads                    2
client.tcp-user-timeout                 0
client.keepalive-time                   20
client.keepalive-interval               2
client.keepalive-count                  9
network.tcp-window-size                 (null)
network.inode-lru-limit                 16384
auth.allow                              *
auth.reject                             (null)
transport.keepalive                     1
server.allow-insecure                   on
server.root-squash                      off
server.anonuid                          65534
server.anongid                          65534
server.statedump-path                   /var/run/gluster
server.outstanding-rpc-limit            64
server.ssl                              (null)
auth.ssl-allow                          *
server.manage-gids                      off
server.dynamic-auth                     on
client.send-gids                        on
server.gid-timeout                      300
server.own-thread                       (null)
server.event-threads                    1
server.tcp-user-timeout                 0
server.keepalive-time                   20
server.keepalive-interval               2
server.keepalive-count                  9
transport.listen-backlog                1024
ssl.own-cert                            (null)
ssl.private-key                         (null)
ssl.ca-list                             (null)
ssl.crl-path                            (null)
ssl.certificate-depth                   (null)
ssl.cipher-list                         (null)
ssl.dh-param                            (null)
ssl.ec-curve                            (null)
transport.address-family                inet
performance.write-behind                on
performance.read-ahead                  on
performance.readdir-ahead               on
performance.io-cache                    on
performance.quick-read                  on
performance.open-behind                 on
performance.nl-cache                    off
performance.stat-prefetch               on
performance.client-io-threads           off
performance.nfs.write-behind            on
performance.nfs.read-ahead              off
performance.nfs.io-cache                off
performance.nfs.quick-read              off
performance.nfs.stat-prefetch           off
performance.nfs.io-threads              off
performance.force-readdirp              true
performance.cache-invalidation          false
features.uss                            off
features.snapshot-directory             .snaps
features.show-snapshot-directory        off
features.tag-namespaces                 off
network.compression                     off
network.compression.window-size         -15
network.compression.mem-level           8
network.compression.min-size            0
network.compression.compression-level   -1
network.compression.debug               false
features.default-soft-limit             80%
features.soft-timeout                   60
features.hard-timeout                   5
features.alert-time                     86400
features.quota-deem-statfs              off
geo-replication.indexing                off
geo-replication.indexing                off
geo-replication.ignore-pid-check        off
geo-replication.ignore-pid-check        off
features.quota                          off
features.inode-quota                    off
features.bitrot                         disable
debug.trace                             off
debug.log-history                       no
debug.log-file                          no
debug.exclude-ops                       (null)
debug.include-ops                       (null)
debug.error-gen                         off
debug.error-failure                     (null)
debug.error-number                      (null)
debug.random-failure                    off
debug.error-fops                        (null)
nfs.disable                             on
features.read-only                      off
features.worm                           off
features.worm-file-level                off
features.worm-files-deletable           on
features.default-retention-period       120
features.retention-mode                 relax
features.auto-commit-period             180
storage.linux-aio                       off
storage.batch-fsync-mode                reverse-fsync
storage.batch-fsync-delay-usec          0
storage.owner-uid                       -1
storage.owner-gid                       -1
storage.node-uuid-pathinfo              off
storage.health-check-interval           30
storage.build-pgfid                     off
storage.gfid2path                       on
storage.gfid2path-separator             :
storage.reserve                         1
storage.health-check-timeout            10
storage.fips-mode-rchecksum             off
storage.force-create-mode               0000
storage.force-directory-mode            0000
storage.create-mask                     0777
storage.create-directory-mask           0777
storage.max-hardlinks                   100
storage.ctime                           off
storage.bd-aio                          off
config.gfproxyd                         off
cluster.server-quorum-type              off
cluster.server-quorum-ratio             0
changelog.changelog                     off
changelog.changelog-dir                 {{ brick.path
}}/.glusterfs/changelogs
changelog.encoding                      ascii
changelog.rollover-time                 15
changelog.fsync-interval                5
changelog.changelog-barrier-timeout     120
changelog.capture-del-path              off
features.barrier                        disable
features.barrier-timeout                120
features.trash                          off
features.trash-dir                      .trashcan
features.trash-eliminate-path           (null)
features.trash-max-filesize             5MB
features.trash-internal-op              off
cluster.enable-shared-storage           disable
cluster.write-freq-threshold            0
cluster.read-freq-threshold             0
cluster.tier-pause                      off
cluster.tier-promote-frequency          120
cluster.tier-demote-frequency           3600
cluster.watermark-hi                    90
cluster.watermark-low                   75
cluster.tier-mode                       cache
cluster.tier-max-promote-file-size      0
cluster.tier-max-mb                     4000
cluster.tier-max-files                  10000
cluster.tier-query-limit                100
cluster.tier-compact                    on
cluster.tier-hot-compact-frequency      604800
cluster.tier-cold-compact-frequency     604800
features.ctr-enabled                    off
features.record-counters                off
features.ctr-record-metadata-heat       off
features.ctr_link_consistency           off
features.ctr_lookupheal_link_timeout    300
features.ctr_lookupheal_inode_timeout   300
features.ctr-sql-db-cachesize           12500
features.ctr-sql-db-wal-autocheckpoint  25000
features.selinux                        on
locks.trace                             off
locks.mandatory-locking                 off
cluster.disperse-self-heal-daemon       enable
cluster.quorum-reads                    no
client.bind-insecure                    (null)
features.shard                          off
features.shard-block-size               64MB
features.shard-lru-limit                16384
features.shard-deletion-rate            100
features.scrub-throttle                 lazy
features.scrub-freq                     biweekly
features.scrub                          false
features.expiry-time                    120
features.cache-invalidation             off
features.cache-invalidation-timeout     60
features.leases                         off
features.lease-lock-recall-timeout      60
disperse.background-heals               8
disperse.heal-wait-qlength              128
cluster.heal-timeout                    600
dht.force-readdirp                      on
disperse.read-policy                    gfid-hash
cluster.shd-max-threads                 1
cluster.shd-wait-qlength                1024
cluster.locking-scheme                  full
cluster.granular-entry-heal             no
features.locks-revocation-secs          0
features.locks-revocation-clear-all     false
features.locks-revocation-max-blocked   0
features.locks-monkey-unlocking         false
features.locks-notify-contention        no
features.locks-notify-contention-delay  5
disperse.shd-max-threads                1
disperse.shd-wait-qlength               1024
disperse.cpu-extensions                 auto
disperse.self-heal-window-size          1
cluster.use-compound-fops               off
performance.parallel-readdir            off
performance.rda-request-size            131072
performance.rda-low-wmark               4096
performance.rda-high-wmark              128KB
performance.rda-cache-limit             10MB
performance.nl-cache-positive-entry     false
performance.nl-cache-limit              10MB
performance.nl-cache-timeout            60
cluster.brick-multiplex                 off
cluster.max-bricks-per-process          0
disperse.optimistic-change-log          on
disperse.stripe-cache                   4
cluster.halo-enabled                    False
cluster.halo-shd-max-latency            99999
cluster.halo-nfsd-max-latency           5
cluster.halo-max-latency                5
cluster.halo-max-replicas               99999
cluster.halo-min-replicas               2
cluster.daemon-log-level                INFO
debug.delay-gen                         off
delay-gen.delay-percentage              10%
delay-gen.delay-duration                100000
delay-gen.enable
disperse.parallel-writes                on
features.sdfs                           on
features.cloudsync                      off
features.utime                          off
ctime.noatime                           on
feature.cloudsync-storetype             (null)
Thanks again.
On Wed, 25 Dec 2019 at 05:51, Strahil <hunter86_bg at yahoo.com> wrote:
> Hi David,
>
> On Dec 24, 2019 02:47, David Cunningham <dcunningham at
voisonics.com> wrote:
> >
> > Hello,
> >
> > In testing we found that actually the GFS client having access to all
3
> nodes made no difference to performance. Perhaps that's because the 3rd
> node that wasn't accessible from the client before was the arbiter
node?
> It makes sense, as no data is being generated towards the arbiter.
> > Presumably we shouldn't have an arbiter node listed under
> backupvolfile-server when mounting the filesystem? Since it doesn't
store
> all the data surely it can't be used to serve the data.
>
> I have my arbiter defined as last backup and no issues so far. At least
> the admin can easily identify the bricks from the mount options.
>
> > We did have direct-io-mode=disable already as well, so that wasn't
a
> factor in the performance problems.
>
> Have you checked if the client vedsion ia not too old.
> Also you can check the cluster's  operation cersion:
> # gluster volume get all cluster.max-op-version
> # gluster volume get all cluster.op-version
>
> Cluster's op version should be at max-op-version.
>
> In my mind come 2  options:
> A) Upgrade to latest GLUSTER v6 or even v7 ( I know it won't be easy)
and
> then set the op version to highest possible.
> # gluster volume get all cluster.max-op-version
> # gluster volume get all cluster.op-version
>
> B)  Deploy a NFS Ganesha server and connect the client over NFS v4.2 (and
> control the parallel connections from Ganesha).
>
> Can you provide your  Gluster volume's  options?
> 'gluster volume get <VOLNAME>  all'
>
> > Thanks again for any advice.
> >
> >
> >
> > On Mon, 23 Dec 2019 at 13:09, David Cunningham <
> dcunningham at voisonics.com> wrote:
> >>
> >> Hi Strahil,
> >>
> >> Thanks for that. We do have one backup server specified, but will
add
> the second backup as well.
> >>
> >>
> >> On Sat, 21 Dec 2019 at 11:26, Strahil <hunter86_bg at
yahoo.com> wrote:
> >>>
> >>> Hi David,
> >>>
> >>> Also consider using the  mount option to specify backup server
via
> 'backupvolfile-server=server2:server3' (you can define more but I
don't
> thing replica volumes  greater that 3 are usefull (maybe  in some special
> cases).
> >>>
> >>> In such way, when the primary is lost, your client can reach a
backup
> one without disruption.
> >>>
> >>> P.S.: Client may 'hang' - if the primary server got
rebooted
> ungracefully - as the communication must timeout before FUSE addresses the
> next server. There is a special script for  killing gluster processes in
> '/usr/share/gluster/scripts' which can be used  for  setting up a
systemd
> service to do that for you on shutdown.
> >>>
> >>> Best Regards,
> >>> Strahil Nikolov
> >>>
> >>> On Dec 20, 2019 23:49, David Cunningham <dcunningham at
voisonics.com>
> wrote:
> >>>>
> >>>> Hi Stahil,
> >>>>
> >>>> Ah, that is an important point. One of the nodes is not
accessible
> from the client, and we assumed that it only needed to reach the GFS node
> that was mounted so didn't think anything of it.
> >>>>
> >>>> We will try making all nodes accessible, as well as
> "direct-io-mode=disable".
> >>>>
> >>>> Thank you.
> >>>>
> >>>>
> >>>> On Sat, 21 Dec 2019 at 10:29, Strahil Nikolov
<hunter86_bg at yahoo.com>
> wrote:
> >>>>>
> >>>>> Actually I haven't clarified myself.
> >>>>> FUSE mounts on the client side is connecting directly
to all bricks
> consisted of the volume.
> >>>>> If for some reason (bad routing, firewall blocked)
there could be
> cases where the client can reach 2 out of 3 bricks and this can constantly
> cause healing to happen (as one of the bricks is never updated) which will
> degrade the performance and cause excessive network usage.
> >>>>> As your attachment is from one of the gluster nodes,
this could be
> the case.
> >>>>>
> >>>>> Best Regards,
> >>>>> Strahil Nikolov
> >>>>>
> >>>>> ? ?????, 20 ???????? 2019 ?., 01:49:56 ?. ???????+2,
David
> Cunningham <dcunningham at voisonics.com> ??????:
> >>>>>
> >>>>>
> >>>>> Hi Strahil,
> >>>>>
> >>>>> The chart attached to my original email is taken from
the GFS server.
> >>>>>
> >>>>> I'm not sure what you mean by accessing all bricks
simultaneously.
> We've mounted it from the client like this:
> >>>>> gfs1:/gvol0 /mnt/glusterfs/ glusterfs
>
defaults,direct-io-mode=disable,_netdev,backupvolfile-server=gfs2,fetch-attempts=10
> 0 0
> >>>>>
> >>>>> Should we do something different to access all bricks
simultaneously?
> >>>>>
> >>>>> Thanks for your help!
> >>>>>
> >>>>>
> >>>>> On Fri, 20 Dec 2019 at 11:47, Strahil Nikolov
<hunter86_bg at yahoo.com>
> wrote:
> >>>>>>
> >>>>>> I'm not sure if you did measure the traffic
from client side
> (tcpdump on a client machine) or from Server side.
> >>>>>>
> >>>>>> In both cases , please verify that the client
accesses all bricks
> simultaneously, as this can cause unnecessary heals.
> >>>>>>
> >>>>>> Have you thought about upgrading to v6? There are
some enhancements
> in v6 which could be beneficial.
> >>>>>>
> >>>>>> Yet, it is indeed strange that so much traffic is
generated with
> FUSE.
> >>>>>>
> >>>>>> Another aproach is to test with NFSGanesha which
suports pNFS and
> can natively speak with Gluster, which cant bring you closer to the
> previous setup and also provide some extra performance.
> >>>>>>
> >>>>>>
> >>>>>> Best Regards,
> >>>>>> Strahil Nikolov
> >>>>>>
> >>>>>>
> >>>>>>
> >>
> >>
> >> --
> >> David Cunningham, Voisonics Limited
> >> http://voisonics.com/
> >> USA: +1 213 221 1092
> >> New Zealand: +64 (0)28 2558 3782
> >
> >
> >
> > --
> > David Cunningham, Voisonics Limited
> > http://voisonics.com/
> > USA: +1 213 221 1092
> > New Zealand: +64 (0)28 2558 3782
>
> Best Regards,
> Strahil Nikolov
>
-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20191227/e883e643/attachment.html>