Jeff Byers
2018-Jan-30 23:29 UTC
[Gluster-users] Tiered volume performance degrades badly after a volume stop/start or system restart.
I am fighting this issue: Bug 1540376 ? Tiered volume performance degrades badly after a volume stop/start or system restart. https://bugzilla.redhat.com/show_bug.cgi?id=1540376 Does anyone have any ideas on what might be causing this, and what a fix or work-around might be? Thanks! ~ Jeff Byers ~ Tiered volume performance degrades badly after a volume stop/start or system restart. The degradation is very significant, making the performance of an SSD hot tiered volume a fraction of what it was with the HDD before tiering. Stopping and starting the tiered volume causes the problem to exhibit. Stopping and starting the Gluster services also does. Nothing in the tier is being promoted or demoted, the volume starts empty, a file is written, then read, then deleted. The file(s) only ever exist on the hot tier. This affects GlusterFS FUSE mounts, and also NFSv3 NFS mounts. The problem has been reproduced in two test lab environments. The issue was first seen using GlusterFS 3.7.18, and retested with the same result using GlusterFS 3.12.3. I'm using the default tiering settings, no adjustments. Nothing of any significance appears to be being reported in the GlusterFS logs. Summary: Before SSD tiering, HDD performance on a FUSE mount was 130.87 MB/sec writes, 128.53 MB/sec reads. After SSD tiering, performance on a FUSE mount was 199.99 MB/sec writes, 257.28 MB/sec reads. After GlusterFS volume stop/start, SSD tiering performance on FUSE mount was 35.81 MB/sec writes, 37.33 MB/sec reads. A very significant reduction in performance. Detaching and reattaching the SSD tier restores the good tiered performance. ~ Jeff Byers ~
Vlad Kopylov
2018-Jan-31 06:17 UTC
[Gluster-users] Tiered volume performance degrades badly after a volume stop/start or system restart.
Tested it in two different environments lately with exactly same results. Was trying to get better read performance from local mounts with hundreds of thousands maildir email files by using SSD, hoping that .gluster file stat read will improve which does migrate to hot tire. After seeing what you described for 24 hours and confirming all move around on the tires is done - killed it. Here are my volume settings - maybe will be useful to spot conflicting ones. cluster.shd-max-threads: 12 performance.rda-cache-limit: 128MB cluster.readdir-optimize: on cluster.read-hash-mode: 0 performance.strict-o-direct: on cluster.lookup-unhashed: auto performance.nl-cache: on performance.nl-cache-timeout: 600 cluster.lookup-optimize: on client.event-threads: 8 performance.client-io-threads: on performance.md-cache-timeout: 600 server.event-threads: 8 features.cache-invalidation: on features.cache-invalidation-timeout: 600 performance.stat-prefetch: on performance.cache-invalidation: on network.inode-lru-limit: 90000 performance.cache-refresh-timeout: 10 performance.enable-least-priority: off performance.cache-size: 2GB cluster.nufa: on cluster.choose-local: on server.outstanding-rpc-limit: 128 fuse mounting defaults,_netdev,negative-timeout=10,attribute-timeout=30,fopen-keep-cache,direct-io-mode=enable,fetch-attempts=5 On Tue, Jan 30, 2018 at 6:29 PM, Jeff Byers <jbyers.sfly at gmail.com> wrote:> I am fighting this issue: > > Bug 1540376 ? Tiered volume performance degrades badly after a > volume stop/start or system restart. > https://bugzilla.redhat.com/show_bug.cgi?id=1540376 > > Does anyone have any ideas on what might be causing this, and > what a fix or work-around might be? > > Thanks! > > ~ Jeff Byers ~ > > Tiered volume performance degrades badly after a volume > stop/start or system restart. > > The degradation is very significant, making the performance of > an SSD hot tiered volume a fraction of what it was with the > HDD before tiering. > > Stopping and starting the tiered volume causes the problem to > exhibit. Stopping and starting the Gluster services also does. > > Nothing in the tier is being promoted or demoted, the volume > starts empty, a file is written, then read, then deleted. The > file(s) only ever exist on the hot tier. > > This affects GlusterFS FUSE mounts, and also NFSv3 NFS mounts. > The problem has been reproduced in two test lab environments. > The issue was first seen using GlusterFS 3.7.18, and retested > with the same result using GlusterFS 3.12.3. > > I'm using the default tiering settings, no adjustments. > > Nothing of any significance appears to be being reported in > the GlusterFS logs. > > Summary: > > Before SSD tiering, HDD performance on a FUSE mount was 130.87 > MB/sec writes, 128.53 MB/sec reads. > > After SSD tiering, performance on a FUSE mount was 199.99 > MB/sec writes, 257.28 MB/sec reads. > > After GlusterFS volume stop/start, SSD tiering performance on > FUSE mount was 35.81 MB/sec writes, 37.33 MB/sec reads. A very > significant reduction in performance. > > Detaching and reattaching the SSD tier restores the good > tiered performance. > > ~ Jeff Byers ~ > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users
Jeff Byers
2018-Feb-01 17:32 UTC
[Gluster-users] Tiered volume performance degrades badly after a volume stop/start or system restart.
This problem appears to be related to the sqlite3 DB files that are used for the tiering file access counters, stored on each hot and cold tier brick in .glusterfs/<volname>.db. When the tier is first created, these DB files do not exist, they are created, and everything works fine. On a stop/start or service restart, the .db files are already present, albeit empty since I don't have cluster.write-freq- threshold nor cluster.read-freq-threshold set, so features.record-counters is off and nothing should be going into the DB. I've found that if I delete these .db files after the volume stop, but before the volume start, the tiering performance is normal, not degraded. Of course all of the history in these DB files is lost. Not sure what other ramifications there are to deleting these .db files. When I did have one of the freq-threshold settings set, I did see a record get added to the file, so the sqlite3 DB is working to some degree. The sqlite3 version I have installed is sqlite-3.6.20- 1.el6_7.2.x86_64. On Tue, Jan 30, 2018 at 10:17 PM, Vlad Kopylov <vladkopy at gmail.com> wrote:> Tested it in two different environments lately with exactly same results. > Was trying to get better read performance from local mounts with > hundreds of thousands maildir email files by using SSD, > hoping that .gluster file stat read will improve which does migrate > to hot tire. > After seeing what you described for 24 hours and confirming all move > around on the tires is done - killed it. > Here are my volume settings - maybe will be useful to spot conflicting ones. > > cluster.shd-max-threads: 12 > performance.rda-cache-limit: 128MB > cluster.readdir-optimize: on > cluster.read-hash-mode: 0 > performance.strict-o-direct: on > cluster.lookup-unhashed: auto > performance.nl-cache: on > performance.nl-cache-timeout: 600 > cluster.lookup-optimize: on > client.event-threads: 8 > performance.client-io-threads: on > performance.md-cache-timeout: 600 > server.event-threads: 8 > features.cache-invalidation: on > features.cache-invalidation-timeout: 600 > performance.stat-prefetch: on > performance.cache-invalidation: on > network.inode-lru-limit: 90000 > performance.cache-refresh-timeout: 10 > performance.enable-least-priority: off > performance.cache-size: 2GB > cluster.nufa: on > cluster.choose-local: on > server.outstanding-rpc-limit: 128 > > fuse mounting defaults,_netdev,negative-timeout=10,attribute-timeout=30,fopen-keep-cache,direct-io-mode=enable,fetch-attempts=5 > > On Tue, Jan 30, 2018 at 6:29 PM, Jeff Byers <jbyers.sfly at gmail.com> wrote: >> I am fighting this issue: >> >> Bug 1540376 ? Tiered volume performance degrades badly after a >> volume stop/start or system restart. >> https://bugzilla.redhat.com/show_bug.cgi?id=1540376 >> >> Does anyone have any ideas on what might be causing this, and >> what a fix or work-around might be? >> >> Thanks! >> >> ~ Jeff Byers ~ >> >> Tiered volume performance degrades badly after a volume >> stop/start or system restart. >> >> The degradation is very significant, making the performance of >> an SSD hot tiered volume a fraction of what it was with the >> HDD before tiering. >> >> Stopping and starting the tiered volume causes the problem to >> exhibit. Stopping and starting the Gluster services also does. >> >> Nothing in the tier is being promoted or demoted, the volume >> starts empty, a file is written, then read, then deleted. The >> file(s) only ever exist on the hot tier. >> >> This affects GlusterFS FUSE mounts, and also NFSv3 NFS mounts. >> The problem has been reproduced in two test lab environments. >> The issue was first seen using GlusterFS 3.7.18, and retested >> with the same result using GlusterFS 3.12.3. >> >> I'm using the default tiering settings, no adjustments. >> >> Nothing of any significance appears to be being reported in >> the GlusterFS logs. >> >> Summary: >> >> Before SSD tiering, HDD performance on a FUSE mount was 130.87 >> MB/sec writes, 128.53 MB/sec reads. >> >> After SSD tiering, performance on a FUSE mount was 199.99 >> MB/sec writes, 257.28 MB/sec reads. >> >> After GlusterFS volume stop/start, SSD tiering performance on >> FUSE mount was 35.81 MB/sec writes, 37.33 MB/sec reads. A very >> significant reduction in performance. >> >> Detaching and reattaching the SSD tier restores the good >> tiered performance. >> >> ~ Jeff Byers ~ >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users-- ~ Jeff Byers ~
Apparently Analagous Threads
- Tiered volume performance degrades badly after a volume stop/start or system restart.
- Tiered volume performance degrades badly after a volume stop/start or system restart.
- On sharded tiered volume, only first shard of new file goes on hot tier.
- Testing sharding on tiered volume
- Testing sharding on tiered volume