thr3ads.net - Gluster users - [Gluster-users] Tiered volume performance degrades badly after a volume stop/start or system restart. [Jan 2018]

If this information is useful, please help other people find it:
Share via:

Jeff Byers

2018-Jan-30 23:29 UTC

[Gluster-users] Tiered volume performance degrades badly after a volume stop/start or system restart.

I am fighting this issue:

  Bug 1540376 ? Tiered volume performance degrades badly after a
volume stop/start or system restart.
  https://bugzilla.redhat.com/show_bug.cgi?id=1540376

Does anyone have any ideas on what might be causing this, and
what a fix or work-around might be?

Thanks!

~ Jeff Byers ~

Tiered volume performance degrades badly after a volume
stop/start or system restart.

The degradation is very significant, making the performance of
an SSD hot tiered volume a fraction of what it was with the
HDD before tiering.

Stopping and starting the tiered volume causes the problem to
exhibit. Stopping and starting the Gluster services also does.

Nothing in the tier is being promoted or demoted, the volume
starts empty, a file is written, then read, then deleted. The
file(s) only ever exist on the hot tier.

This affects GlusterFS FUSE mounts, and also NFSv3 NFS mounts.
The problem has been reproduced in two test lab environments.
The issue was first seen using GlusterFS 3.7.18, and retested
with the same result using GlusterFS 3.12.3.

I'm using the default tiering settings, no adjustments.

Nothing of any significance appears to be being reported in
the GlusterFS logs.

Summary:

Before SSD tiering, HDD performance on a FUSE mount was 130.87
MB/sec writes, 128.53 MB/sec reads.

After SSD tiering, performance on a FUSE mount was 199.99
MB/sec writes, 257.28 MB/sec reads.

After GlusterFS volume stop/start, SSD tiering performance on
FUSE mount was 35.81 MB/sec writes, 37.33 MB/sec reads. A very
significant reduction in performance.

Detaching and reattaching the SSD tier restores the good
tiered performance.

~ Jeff Byers ~

Vlad Kopylov

2018-Jan-31 06:17 UTC

head link

[Gluster-users] Tiered volume performance degrades badly after a volume stop/start or system restart.

Tested it in two different environments lately with exactly same results.
Was trying to get better read performance from local mounts with
hundreds of thousands maildir email files by using SSD,
   hoping that .gluster file stat read will improve which does migrate
to hot tire.
After seeing what you described for 24 hours and confirming all move
around on the tires is done - killed it.
Here are my volume settings - maybe will be useful to spot conflicting ones.

cluster.shd-max-threads: 12
performance.rda-cache-limit: 128MB
cluster.readdir-optimize: on
cluster.read-hash-mode: 0
performance.strict-o-direct: on
cluster.lookup-unhashed: auto
performance.nl-cache: on
performance.nl-cache-timeout: 600
cluster.lookup-optimize: on
client.event-threads: 8
performance.client-io-threads: on
performance.md-cache-timeout: 600
server.event-threads: 8
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-invalidation: on
network.inode-lru-limit: 90000
performance.cache-refresh-timeout: 10
performance.enable-least-priority: off
performance.cache-size: 2GB
cluster.nufa: on
cluster.choose-local: on
server.outstanding-rpc-limit: 128

fuse mounting
defaults,_netdev,negative-timeout=10,attribute-timeout=30,fopen-keep-cache,direct-io-mode=enable,fetch-attempts=5

On Tue, Jan 30, 2018 at 6:29 PM, Jeff Byers <jbyers.sfly at gmail.com>
wrote:> I am fighting this issue:
>
>   Bug 1540376 ? Tiered volume performance degrades badly after a
> volume stop/start or system restart.
>   https://bugzilla.redhat.com/show_bug.cgi?id=1540376
>
> Does anyone have any ideas on what might be causing this, and
> what a fix or work-around might be?
>
> Thanks!
>
> ~ Jeff Byers ~
>
> Tiered volume performance degrades badly after a volume
> stop/start or system restart.
>
> The degradation is very significant, making the performance of
> an SSD hot tiered volume a fraction of what it was with the
> HDD before tiering.
>
> Stopping and starting the tiered volume causes the problem to
> exhibit. Stopping and starting the Gluster services also does.
>
> Nothing in the tier is being promoted or demoted, the volume
> starts empty, a file is written, then read, then deleted. The
> file(s) only ever exist on the hot tier.
>
> This affects GlusterFS FUSE mounts, and also NFSv3 NFS mounts.
> The problem has been reproduced in two test lab environments.
> The issue was first seen using GlusterFS 3.7.18, and retested
> with the same result using GlusterFS 3.12.3.
>
> I'm using the default tiering settings, no adjustments.
>
> Nothing of any significance appears to be being reported in
> the GlusterFS logs.
>
> Summary:
>
> Before SSD tiering, HDD performance on a FUSE mount was 130.87
> MB/sec writes, 128.53 MB/sec reads.
>
> After SSD tiering, performance on a FUSE mount was 199.99
> MB/sec writes, 257.28 MB/sec reads.
>
> After GlusterFS volume stop/start, SSD tiering performance on
> FUSE mount was 35.81 MB/sec writes, 37.33 MB/sec reads. A very
> significant reduction in performance.
>
> Detaching and reattaching the SSD tier restores the good
> tiered performance.
>
> ~ Jeff Byers ~
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

Jeff Byers

2018-Feb-01 17:32 UTC

head link

[Gluster-users] Tiered volume performance degrades badly after a volume stop/start or system restart.

This problem appears to be related to the sqlite3 DB files
that are used for the tiering file access counters, stored on
each hot and cold tier brick in .glusterfs/<volname>.db.

When the tier is first created, these DB files do not exist,
they are created, and everything works fine.

On a stop/start or service restart, the .db files are already
present, albeit empty since I don't have cluster.write-freq-
threshold nor cluster.read-freq-threshold set, so
features.record-counters is off and nothing should be going
into the DB.

I've found that if I delete these .db files after the volume
stop, but before the volume start, the tiering performance is
normal, not degraded. Of course all of the history in these DB
files is lost. Not sure what other ramifications there are to
deleting these .db files.

When I did have one of the freq-threshold settings set, I did
see a record get added to the file, so the sqlite3 DB is
working to some degree.

The sqlite3 version I have installed is sqlite-3.6.20-
1.el6_7.2.x86_64.

On Tue, Jan 30, 2018 at 10:17 PM, Vlad Kopylov <vladkopy at gmail.com>
wrote:> Tested it in two different environments lately with exactly same results.
> Was trying to get better read performance from local mounts with
> hundreds of thousands maildir email files by using SSD,
>    hoping that .gluster file stat read will improve which does migrate
> to hot tire.
> After seeing what you described for 24 hours and confirming all move
> around on the tires is done - killed it.
> Here are my volume settings - maybe will be useful to spot conflicting
ones.
>
> cluster.shd-max-threads: 12
> performance.rda-cache-limit: 128MB
> cluster.readdir-optimize: on
> cluster.read-hash-mode: 0
> performance.strict-o-direct: on
> cluster.lookup-unhashed: auto
> performance.nl-cache: on
> performance.nl-cache-timeout: 600
> cluster.lookup-optimize: on
> client.event-threads: 8
> performance.client-io-threads: on
> performance.md-cache-timeout: 600
> server.event-threads: 8
> features.cache-invalidation: on
> features.cache-invalidation-timeout: 600
> performance.stat-prefetch: on
> performance.cache-invalidation: on
> network.inode-lru-limit: 90000
> performance.cache-refresh-timeout: 10
> performance.enable-least-priority: off
> performance.cache-size: 2GB
> cluster.nufa: on
> cluster.choose-local: on
> server.outstanding-rpc-limit: 128
>
> fuse mounting
defaults,_netdev,negative-timeout=10,attribute-timeout=30,fopen-keep-cache,direct-io-mode=enable,fetch-attempts=5
>
> On Tue, Jan 30, 2018 at 6:29 PM, Jeff Byers <jbyers.sfly at
gmail.com> wrote:
>> I am fighting this issue:
>>
>>   Bug 1540376 ? Tiered volume performance degrades badly after a
>> volume stop/start or system restart.
>>   https://bugzilla.redhat.com/show_bug.cgi?id=1540376
>>
>> Does anyone have any ideas on what might be causing this, and
>> what a fix or work-around might be?
>>
>> Thanks!
>>
>> ~ Jeff Byers ~
>>
>> Tiered volume performance degrades badly after a volume
>> stop/start or system restart.
>>
>> The degradation is very significant, making the performance of
>> an SSD hot tiered volume a fraction of what it was with the
>> HDD before tiering.
>>
>> Stopping and starting the tiered volume causes the problem to
>> exhibit. Stopping and starting the Gluster services also does.
>>
>> Nothing in the tier is being promoted or demoted, the volume
>> starts empty, a file is written, then read, then deleted. The
>> file(s) only ever exist on the hot tier.
>>
>> This affects GlusterFS FUSE mounts, and also NFSv3 NFS mounts.
>> The problem has been reproduced in two test lab environments.
>> The issue was first seen using GlusterFS 3.7.18, and retested
>> with the same result using GlusterFS 3.12.3.
>>
>> I'm using the default tiering settings, no adjustments.
>>
>> Nothing of any significance appears to be being reported in
>> the GlusterFS logs.
>>
>> Summary:
>>
>> Before SSD tiering, HDD performance on a FUSE mount was 130.87
>> MB/sec writes, 128.53 MB/sec reads.
>>
>> After SSD tiering, performance on a FUSE mount was 199.99
>> MB/sec writes, 257.28 MB/sec reads.
>>
>> After GlusterFS volume stop/start, SSD tiering performance on
>> FUSE mount was 35.81 MB/sec writes, 37.33 MB/sec reads. A very
>> significant reduction in performance.
>>
>> Detaching and reattaching the SSD tier restores the good
>> tiered performance.
>>
>> ~ Jeff Byers ~
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users


-- 
~ Jeff Byers ~

Possibly Parallel Threads

Search for more reasonably related threads

Gluster users - Jan 2018 - Tiered volume performance degrades badly after a volume stop/start or system restart.

[Gluster-users] Tiered volume performance degrades badly after a volume stop/start or system restart.

[Gluster-users] Tiered volume performance degrades badly after a volume stop/start or system restart.

[Gluster-users] Tiered volume performance degrades badly after a volume stop/start or system restart.

Possibly Parallel Threads