thr3ads.net - Gluster users - [Gluster-users] Blocking IO when hot tier promotion daemon runs [Jan 2018]

If this information is useful, please help other people find it:
Share via:

Hari Gowtham

2018-Jan-18 10:12 UTC

[Gluster-users] Blocking IO when hot tier promotion daemon runs

Hi Tom,

The volume info doesn't show the hot bricks. I think you have took the
volume info output before attaching the hot tier.
Can you send the volume info of the current setup where you see this issue.

The logs you sent are from a later point in time. The issue is hit
earlier than the logs what is available in the log. I need the logs
from an earlier time.
And along with the entire tier logs, can you send the glusterd and
brick logs too?

Rest of the comments are inline

On Wed, Jan 10, 2018 at 9:03 PM, Tom Fite <tomfite at gmail.com>
wrote:> I should add that additional testing has shown that only accessing files is
> held up, IO is not interrupted for existing transfers. I think this points
> to the heat metadata in the sqlite DB for the tier, is it possible that a
> table is temporarily locked while the promotion daemon runs so the calls to
> update the access count on files are blocked?
>
>
> On Wed, Jan 10, 2018 at 10:17 AM, Tom Fite <tomfite at gmail.com>
wrote:
>>
>> The sizes of the files are extremely varied, there are millions of
small
>> (<1 MB) files and thousands of files larger than 1 GB.
The tier use case is for bigger size files. not the best for files of
smaller size.
That can end up hindering the IOs.
>>
>> Attached is the tier log for gluster1 and gluster2. These are full of
>> "demotion failed" messages, which is also shown in the
status:
>>
>> [root at pod-sjc1-gluster1 gv0]# gluster volume tier gv0 status
>> Node                 Promoted files       Demoted files        Status
>> run time in h:m:s
>> ---------            ---------            ---------           
---------
>> ---------
>> localhost            25940                0                    in
progress
>> 112:21:49
>> pod-sjc1-gluster2 0                    2917154              in progress
>> 112:21:49
>>
>> Is it normal to have promotions and demotions only happen on each
server
>> but not both?
No. its not normal.
>>
>> Volume info:
>>
>> [root at pod-sjc1-gluster1 ~]# gluster volume info
>>
>> Volume Name: gv0
>> Type: Distributed-Replicate
>> Volume ID: d490a9ec-f9c8-4f10-a7f3-e1b6d3ced196
>> Status: Started
>> Snapshot Count: 13
>> Number of Bricks: 3 x 2 = 6
>> Transport-type: tcp
>> Bricks:
>> Brick1: pod-sjc1-gluster1:/data/brick1/gv0
>> Brick2: pod-sjc1-gluster2:/data/brick1/gv0
>> Brick3: pod-sjc1-gluster1:/data/brick2/gv0
>> Brick4: pod-sjc1-gluster2:/data/brick2/gv0
>> Brick5: pod-sjc1-gluster1:/data/brick3/gv0
>> Brick6: pod-sjc1-gluster2:/data/brick3/gv0
>> Options Reconfigured:
>> performance.cache-refresh-timeout: 60
>> performance.stat-prefetch: on
>> server.allow-insecure: on
>> performance.flush-behind: on
>> performance.rda-cache-limit: 32MB
>> network.tcp-window-size: 1048576
>> performance.nfs.io-threads: on
>> performance.write-behind-window-size: 4MB
>> performance.nfs.write-behind-window-size: 512MB
>> performance.io-cache: on
>> performance.quick-read: on
>> features.cache-invalidation: on
>> features.cache-invalidation-timeout: 600
>> performance.cache-invalidation: on
>> performance.md-cache-timeout: 600
>> network.inode-lru-limit: 90000
>> performance.cache-size: 4GB
>> server.event-threads: 16
>> client.event-threads: 16
>> features.barrier: disable
>> transport.address-family: inet
>> nfs.disable: on
>> performance.client-io-threads: on
>> cluster.lookup-optimize: on
>> server.outstanding-rpc-limit: 1024
>> auto-delete: enable
>>
>>
>> # gluster volume status
>> Status of volume: gv0
>> Gluster process                             TCP Port  RDMA Port  Online
>> Pid
>>
>>
------------------------------------------------------------------------------
>> Hot Bricks:
>> Brick pod-sjc1-gluster2:/data/
>> hot_tier/gv0                                49219     0          Y
>> 26714
>> Brick pod-sjc1-gluster1:/data/
>> hot_tier/gv0                                49199     0          Y
>> 21325
>> Cold Bricks:
>> Brick pod-sjc1-gluster1:/data/
>> brick1/gv0                                  49152     0          Y
>> 3178
>> Brick pod-sjc1-gluster2:/data/
>> brick1/gv0                                  49152     0          Y
>> 4818
>> Brick pod-sjc1-gluster1:/data/
>> brick2/gv0                                  49153     0          Y
>> 3186
>> Brick pod-sjc1-gluster2:/data/
>> brick2/gv0                                  49153     0          Y
>> 4829
>> Brick pod-sjc1-gluster1:/data/
>> brick3/gv0                                  49154     0          Y
>> 3194
>> Brick pod-sjc1-gluster2:/data/
>> brick3/gv0                                  49154     0          Y
>> 4840
>> Tier Daemon on localhost                    N/A       N/A        Y
>> 20313
>> Self-heal Daemon on localhost               N/A       N/A        Y
>> 32023
>> Tier Daemon on pod-sjc1-gluster1            N/A       N/A        Y
>> 24758
>> Self-heal Daemon on pod-sjc1-gluster2       N/A       N/A        Y
>> 12349
>>
>> Task Status of Volume gv0
>>
>>
------------------------------------------------------------------------------
>> There are no active volume tasks
>>
>>
>> On Tue, Jan 9, 2018 at 10:33 PM, Hari Gowtham <hgowtham at
redhat.com> wrote:
>>>
>>> Hi,
>>>
>>> Can you send the volume info, and volume status output and the tier
logs.
>>> And I need to know the size of the files that are being stored.
>>>
>>> On Tue, Jan 9, 2018 at 9:51 PM, Tom Fite <tomfite at
gmail.com> wrote:
>>> > I've recently enabled an SSD backed 2 TB hot tier on my
150 TB 2 server
>>> > / 3
>>> > bricks per server distributed replicated volume.
>>> >
>>> > I'm seeing IO get blocked across all client FUSE threads
for 10 to 15
>>> > seconds while the promotion daemon runs. I see the
'glustertierpro'
>>> > thread
>>> > jump to 99% CPU usage on both boxes when these delays occur
and they
>>> > happen
>>> > every 25 minutes (my tier-promote-frequency setting).
>>> >
>>> > I suspect this has something to do with the heat database in
sqlite,
>>> > maybe
>>> > something is getting locked while it runs the query to
determine files
>>> > to
>>> > promote. My volume contains approximately 18 million files.
>>> >
>>> > Has anybody else seen this? I suspect that these delays will
get worse
>>> > as I
>>> > add more files to my volume which will cause significant
problems.
>>> >
>>> > Here are my hot tier settings:
>>> >
>>> > # gluster volume get gv0 all | grep tier
>>> > cluster.tier-pause                      off
>>> > cluster.tier-promote-frequency          1500
>>> > cluster.tier-demote-frequency           3600
>>> > cluster.tier-mode                       cache
>>> > cluster.tier-max-promote-file-size      10485760
>>> > cluster.tier-max-mb                     64000
>>> > cluster.tier-max-files                  100000
>>> > cluster.tier-query-limit                100
>>> > cluster.tier-compact                    on
>>> > cluster.tier-hot-compact-frequency      86400
>>> > cluster.tier-cold-compact-frequency     86400
>>> >
>>> > # gluster volume get gv0 all | grep threshold
>>> > cluster.write-freq-threshold            2
>>> > cluster.read-freq-threshold             5
>>> >
>>> > # gluster volume get gv0 all | grep watermark
>>> > cluster.watermark-hi                    92
>>> > cluster.watermark-low                   75
>>> >
>>> > _______________________________________________
>>> > Gluster-users mailing list
>>> > Gluster-users at gluster.org
>>> > http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Hari Gowtham.
>>
>>
>


-- 
Regards,
Hari Gowtham.

Tom Fite

2018-Jan-18 16:24 UTC

head link

[Gluster-users] Blocking IO when hot tier promotion daemon runs

Thanks for the info, Hari. Sorry about the bad gluster volume info, I
grabbed that from a file not realizing it was out of date. Here's a current
configuration showing the active hot tier:

[root at pod-sjc1-gluster1 ~]# gluster volume info

Volume Name: gv0
Type: Tier
Volume ID: d490a9ec-f9c8-4f10-a7f3-e1b6d3ced196
Status: Started
Snapshot Count: 13
Number of Bricks: 8
Transport-type: tcp
Hot Tier :
Hot Tier Type : Replicate
Number of Bricks: 1 x 2 = 2
Brick1: pod-sjc1-gluster2:/data/hot_tier/gv0
Brick2: pod-sjc1-gluster1:/data/hot_tier/gv0
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 3 x 2 = 6
Brick3: pod-sjc1-gluster1:/data/brick1/gv0
Brick4: pod-sjc1-gluster2:/data/brick1/gv0
Brick5: pod-sjc1-gluster1:/data/brick2/gv0
Brick6: pod-sjc1-gluster2:/data/brick2/gv0
Brick7: pod-sjc1-gluster1:/data/brick3/gv0
Brick8: pod-sjc1-gluster2:/data/brick3/gv0
Options Reconfigured:
performance.rda-low-wmark: 4KB
performance.rda-request-size: 128KB
storage.build-pgfid: on
cluster.watermark-low: 50
performance.readdir-ahead: off
cluster.tier-cold-compact-frequency: 86400
cluster.tier-hot-compact-frequency: 86400
features.ctr-sql-db-wal-autocheckpoint: 2500
cluster.tier-max-mb: 64000
cluster.tier-max-promote-file-size: 10485760
cluster.tier-max-files: 100000
cluster.tier-demote-frequency: 3600
server.allow-insecure: on
performance.flush-behind: on
performance.rda-cache-limit: 128MB
network.tcp-window-size: 1048576
performance.nfs.io-threads: off
performance.write-behind-window-size: 512MB
performance.nfs.write-behind-window-size: 4MB
performance.io-cache: on
performance.quick-read: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 90000
performance.cache-size: 1GB
server.event-threads: 10
client.event-threads: 10
features.barrier: disable
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: on
cluster.lookup-optimize: on
server.outstanding-rpc-limit: 2056
performance.stat-prefetch: on
performance.cache-refresh-timeout: 60
features.ctr-enabled: on
cluster.tier-mode: cache
cluster.tier-compact: on
cluster.tier-pause: off
cluster.tier-promote-frequency: 1500
features.record-counters: on
cluster.write-freq-threshold: 2
cluster.read-freq-threshold: 5
features.ctr-sql-db-cachesize: 262144
cluster.watermark-hi: 95
auto-delete: enable

It will take some time to get the logs together, I need to strip out
potentially sensitive info, will update with them when I have them.

Any theories as to why the promotions / demotions only take place on one
box but not both?

-Tom

On Thu, Jan 18, 2018 at 5:12 AM, Hari Gowtham <hgowtham at redhat.com>
wrote:
> Hi Tom,
>
> The volume info doesn't show the hot bricks. I think you have took the
> volume info output before attaching the hot tier.
> Can you send the volume info of the current setup where you see this issue.
>
> The logs you sent are from a later point in time. The issue is hit
> earlier than the logs what is available in the log. I need the logs
> from an earlier time.
> And along with the entire tier logs, can you send the glusterd and
> brick logs too?
>
> Rest of the comments are inline
>
> On Wed, Jan 10, 2018 at 9:03 PM, Tom Fite <tomfite at gmail.com>
wrote:
> > I should add that additional testing has shown that only accessing
files
> is
> > held up, IO is not interrupted for existing transfers. I think this
> points
> > to the heat metadata in the sqlite DB for the tier, is it possible
that a
> > table is temporarily locked while the promotion daemon runs so the
calls
> to
> > update the access count on files are blocked?
> >
> >
> > On Wed, Jan 10, 2018 at 10:17 AM, Tom Fite <tomfite at
gmail.com> wrote:
> >>
> >> The sizes of the files are extremely varied, there are millions of
small
> >> (<1 MB) files and thousands of files larger than 1 GB.
>
> The tier use case is for bigger size files. not the best for files of
> smaller size.
> That can end up hindering the IOs.
>
> >>
> >> Attached is the tier log for gluster1 and gluster2. These are full
of
> >> "demotion failed" messages, which is also shown in the
status:
> >>
> >> [root at pod-sjc1-gluster1 gv0]# gluster volume tier gv0 status
> >> Node                 Promoted files       Demoted files       
Status
> >> run time in h:m:s
> >> ---------            ---------            ---------           
---------
> >> ---------
> >> localhost            25940                0                    in
> progress
> >> 112:21:49
> >> pod-sjc1-gluster2 0                    2917154              in
progress
> >> 112:21:49
> >>
> >> Is it normal to have promotions and demotions only happen on each
server
> >> but not both?
>
> No. its not normal.
>
> >>
> >> Volume info:
> >>
> >> [root at pod-sjc1-gluster1 ~]# gluster volume info
> >>
> >> Volume Name: gv0
> >> Type: Distributed-Replicate
> >> Volume ID: d490a9ec-f9c8-4f10-a7f3-e1b6d3ced196
> >> Status: Started
> >> Snapshot Count: 13
> >> Number of Bricks: 3 x 2 = 6
> >> Transport-type: tcp
> >> Bricks:
> >> Brick1: pod-sjc1-gluster1:/data/brick1/gv0
> >> Brick2: pod-sjc1-gluster2:/data/brick1/gv0
> >> Brick3: pod-sjc1-gluster1:/data/brick2/gv0
> >> Brick4: pod-sjc1-gluster2:/data/brick2/gv0
> >> Brick5: pod-sjc1-gluster1:/data/brick3/gv0
> >> Brick6: pod-sjc1-gluster2:/data/brick3/gv0
> >> Options Reconfigured:
> >> performance.cache-refresh-timeout: 60
> >> performance.stat-prefetch: on
> >> server.allow-insecure: on
> >> performance.flush-behind: on
> >> performance.rda-cache-limit: 32MB
> >> network.tcp-window-size: 1048576
> >> performance.nfs.io-threads: on
> >> performance.write-behind-window-size: 4MB
> >> performance.nfs.write-behind-window-size: 512MB
> >> performance.io-cache: on
> >> performance.quick-read: on
> >> features.cache-invalidation: on
> >> features.cache-invalidation-timeout: 600
> >> performance.cache-invalidation: on
> >> performance.md-cache-timeout: 600
> >> network.inode-lru-limit: 90000
> >> performance.cache-size: 4GB
> >> server.event-threads: 16
> >> client.event-threads: 16
> >> features.barrier: disable
> >> transport.address-family: inet
> >> nfs.disable: on
> >> performance.client-io-threads: on
> >> cluster.lookup-optimize: on
> >> server.outstanding-rpc-limit: 1024
> >> auto-delete: enable
> >>
> >>
> >> # gluster volume status
> >> Status of volume: gv0
> >> Gluster process                             TCP Port  RDMA Port 
Online
> >> Pid
> >>
> >> ------------------------------------------------------------
> ------------------
> >> Hot Bricks:
> >> Brick pod-sjc1-gluster2:/data/
> >> hot_tier/gv0                                49219     0          Y
> >> 26714
> >> Brick pod-sjc1-gluster1:/data/
> >> hot_tier/gv0                                49199     0          Y
> >> 21325
> >> Cold Bricks:
> >> Brick pod-sjc1-gluster1:/data/
> >> brick1/gv0                                  49152     0          Y
> >> 3178
> >> Brick pod-sjc1-gluster2:/data/
> >> brick1/gv0                                  49152     0          Y
> >> 4818
> >> Brick pod-sjc1-gluster1:/data/
> >> brick2/gv0                                  49153     0          Y
> >> 3186
> >> Brick pod-sjc1-gluster2:/data/
> >> brick2/gv0                                  49153     0          Y
> >> 4829
> >> Brick pod-sjc1-gluster1:/data/
> >> brick3/gv0                                  49154     0          Y
> >> 3194
> >> Brick pod-sjc1-gluster2:/data/
> >> brick3/gv0                                  49154     0          Y
> >> 4840
> >> Tier Daemon on localhost                    N/A       N/A        Y
> >> 20313
> >> Self-heal Daemon on localhost               N/A       N/A        Y
> >> 32023
> >> Tier Daemon on pod-sjc1-gluster1            N/A       N/A        Y
> >> 24758
> >> Self-heal Daemon on pod-sjc1-gluster2       N/A       N/A        Y
> >> 12349
> >>
> >> Task Status of Volume gv0
> >>
> >> ------------------------------------------------------------
> ------------------
> >> There are no active volume tasks
> >>
> >>
> >> On Tue, Jan 9, 2018 at 10:33 PM, Hari Gowtham <hgowtham at
redhat.com>
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> Can you send the volume info, and volume status output and the
tier
> logs.
> >>> And I need to know the size of the files that are being
stored.
> >>>
> >>> On Tue, Jan 9, 2018 at 9:51 PM, Tom Fite <tomfite at
gmail.com> wrote:
> >>> > I've recently enabled an SSD backed 2 TB hot tier on
my 150 TB 2
> server
> >>> > / 3
> >>> > bricks per server distributed replicated volume.
> >>> >
> >>> > I'm seeing IO get blocked across all client FUSE
threads for 10 to 15
> >>> > seconds while the promotion daemon runs. I see the
'glustertierpro'
> >>> > thread
> >>> > jump to 99% CPU usage on both boxes when these delays
occur and they
> >>> > happen
> >>> > every 25 minutes (my tier-promote-frequency setting).
> >>> >
> >>> > I suspect this has something to do with the heat database
in sqlite,
> >>> > maybe
> >>> > something is getting locked while it runs the query to
determine
> files
> >>> > to
> >>> > promote. My volume contains approximately 18 million
files.
> >>> >
> >>> > Has anybody else seen this? I suspect that these delays
will get
> worse
> >>> > as I
> >>> > add more files to my volume which will cause significant
problems.
> >>> >
> >>> > Here are my hot tier settings:
> >>> >
> >>> > # gluster volume get gv0 all | grep tier
> >>> > cluster.tier-pause                      off
> >>> > cluster.tier-promote-frequency          1500
> >>> > cluster.tier-demote-frequency           3600
> >>> > cluster.tier-mode                       cache
> >>> > cluster.tier-max-promote-file-size      10485760
> >>> > cluster.tier-max-mb                     64000
> >>> > cluster.tier-max-files                  100000
> >>> > cluster.tier-query-limit                100
> >>> > cluster.tier-compact                    on
> >>> > cluster.tier-hot-compact-frequency      86400
> >>> > cluster.tier-cold-compact-frequency     86400
> >>> >
> >>> > # gluster volume get gv0 all | grep threshold
> >>> > cluster.write-freq-threshold            2
> >>> > cluster.read-freq-threshold             5
> >>> >
> >>> > # gluster volume get gv0 all | grep watermark
> >>> > cluster.watermark-hi                    92
> >>> > cluster.watermark-low                   75
> >>> >
> >>> > _______________________________________________
> >>> > Gluster-users mailing list
> >>> > Gluster-users at gluster.org
> >>> > http://lists.gluster.org/mailman/listinfo/gluster-users
> >>>
> >>>
> >>>
> >>> --
> >>> Regards,
> >>> Hari Gowtham.
> >>
> >>
> >
>
>
>
> --
> Regards,
> Hari Gowtham.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180118/d58a415f/attachment.html>

Hari Gowtham

2018-Jan-19 10:26 UTC

head link

[Gluster-users] Blocking IO when hot tier promotion daemon runs

Hi Tom,
>From the logs you sent, I can see that the file for promotion isskipped because the files were
exceeding the max size for promotion. This can make the promotion
value to be shown lesser,

The other scenario is, the files are named similarly and they are
accessed one after the other.
This way all these files are supposed to be migrated to the same subvol.

But for it to have zero promotions is something not likely to happen.

The IO taking a long time is because of large number of small files.

Can you send th output of these too?
# getfattr -n "trusted.tier.fix.layout.complete" /data/brick1/gv0
# getfattr -n "trusted.glusterfs.dht" -e hex {for every brick}

and the version of gluster you are running and the size of data filled
in each brick.

On Thu, Jan 18, 2018 at 9:54 PM, Tom Fite <tomfite at gmail.com>
wrote:> Thanks for the info, Hari. Sorry about the bad gluster volume info, I
> grabbed that from a file not realizing it was out of date. Here's a
current
> configuration showing the active hot tier:
>
> [root at pod-sjc1-gluster1 ~]# gluster volume info
>
> Volume Name: gv0
> Type: Tier
> Volume ID: d490a9ec-f9c8-4f10-a7f3-e1b6d3ced196
> Status: Started
> Snapshot Count: 13
> Number of Bricks: 8
> Transport-type: tcp
> Hot Tier :
> Hot Tier Type : Replicate
> Number of Bricks: 1 x 2 = 2
> Brick1: pod-sjc1-gluster2:/data/hot_tier/gv0
> Brick2: pod-sjc1-gluster1:/data/hot_tier/gv0
> Cold Tier:
> Cold Tier Type : Distributed-Replicate
> Number of Bricks: 3 x 2 = 6
> Brick3: pod-sjc1-gluster1:/data/brick1/gv0
> Brick4: pod-sjc1-gluster2:/data/brick1/gv0
> Brick5: pod-sjc1-gluster1:/data/brick2/gv0
> Brick6: pod-sjc1-gluster2:/data/brick2/gv0
> Brick7: pod-sjc1-gluster1:/data/brick3/gv0
> Brick8: pod-sjc1-gluster2:/data/brick3/gv0
> Options Reconfigured:
> performance.rda-low-wmark: 4KB
> performance.rda-request-size: 128KB
> storage.build-pgfid: on
> cluster.watermark-low: 50
> performance.readdir-ahead: off
> cluster.tier-cold-compact-frequency: 86400
> cluster.tier-hot-compact-frequency: 86400
> features.ctr-sql-db-wal-autocheckpoint: 2500
> cluster.tier-max-mb: 64000
> cluster.tier-max-promote-file-size: 10485760
> cluster.tier-max-files: 100000
> cluster.tier-demote-frequency: 3600
> server.allow-insecure: on
> performance.flush-behind: on
> performance.rda-cache-limit: 128MB
> network.tcp-window-size: 1048576
> performance.nfs.io-threads: off
> performance.write-behind-window-size: 512MB
> performance.nfs.write-behind-window-size: 4MB
> performance.io-cache: on
> performance.quick-read: on
> features.cache-invalidation: on
> features.cache-invalidation-timeout: 600
> performance.cache-invalidation: on
> performance.md-cache-timeout: 600
> network.inode-lru-limit: 90000
> performance.cache-size: 1GB
> server.event-threads: 10
> client.event-threads: 10
> features.barrier: disable
> transport.address-family: inet
> nfs.disable: on
> performance.client-io-threads: on
> cluster.lookup-optimize: on
> server.outstanding-rpc-limit: 2056
> performance.stat-prefetch: on
> performance.cache-refresh-timeout: 60
> features.ctr-enabled: on
> cluster.tier-mode: cache
> cluster.tier-compact: on
> cluster.tier-pause: off
> cluster.tier-promote-frequency: 1500
> features.record-counters: on
> cluster.write-freq-threshold: 2
> cluster.read-freq-threshold: 5
> features.ctr-sql-db-cachesize: 262144
> cluster.watermark-hi: 95
> auto-delete: enable
>
> It will take some time to get the logs together, I need to strip out
> potentially sensitive info, will update with them when I have them.
>
> Any theories as to why the promotions / demotions only take place on one
box
> but not both?
>
> -Tom
>
> On Thu, Jan 18, 2018 at 5:12 AM, Hari Gowtham <hgowtham at
redhat.com> wrote:
>>
>> Hi Tom,
>>
>> The volume info doesn't show the hot bricks. I think you have took
the
>> volume info output before attaching the hot tier.
>> Can you send the volume info of the current setup where you see this
>> issue.
>>
>> The logs you sent are from a later point in time. The issue is hit
>> earlier than the logs what is available in the log. I need the logs
>> from an earlier time.
>> And along with the entire tier logs, can you send the glusterd and
>> brick logs too?
>>
>> Rest of the comments are inline
>>
>> On Wed, Jan 10, 2018 at 9:03 PM, Tom Fite <tomfite at gmail.com>
wrote:
>> > I should add that additional testing has shown that only accessing
files
>> > is
>> > held up, IO is not interrupted for existing transfers. I think
this
>> > points
>> > to the heat metadata in the sqlite DB for the tier, is it possible
that
>> > a
>> > table is temporarily locked while the promotion daemon runs so the
calls
>> > to
>> > update the access count on files are blocked?
>> >
>> >
>> > On Wed, Jan 10, 2018 at 10:17 AM, Tom Fite <tomfite at
gmail.com> wrote:
>> >>
>> >> The sizes of the files are extremely varied, there are
millions of
>> >> small
>> >> (<1 MB) files and thousands of files larger than 1 GB.
>>
>> The tier use case is for bigger size files. not the best for files of
>> smaller size.
>> That can end up hindering the IOs.
>>
>> >>
>> >> Attached is the tier log for gluster1 and gluster2. These are
full of
>> >> "demotion failed" messages, which is also shown in
the status:
>> >>
>> >> [root at pod-sjc1-gluster1 gv0]# gluster volume tier gv0
status
>> >> Node                 Promoted files       Demoted files       
Status
>> >> run time in h:m:s
>> >> ---------            ---------            ---------
>> >> ---------
>> >> ---------
>> >> localhost            25940                0                   
in
>> >> progress
>> >> 112:21:49
>> >> pod-sjc1-gluster2 0                    2917154              in
progress
>> >> 112:21:49
>> >>
>> >> Is it normal to have promotions and demotions only happen on
each
>> >> server
>> >> but not both?
>>
>> No. its not normal.
>>
>> >>
>> >> Volume info:
>> >>
>> >> [root at pod-sjc1-gluster1 ~]# gluster volume info
>> >>
>> >> Volume Name: gv0
>> >> Type: Distributed-Replicate
>> >> Volume ID: d490a9ec-f9c8-4f10-a7f3-e1b6d3ced196
>> >> Status: Started
>> >> Snapshot Count: 13
>> >> Number of Bricks: 3 x 2 = 6
>> >> Transport-type: tcp
>> >> Bricks:
>> >> Brick1: pod-sjc1-gluster1:/data/brick1/gv0
>> >> Brick2: pod-sjc1-gluster2:/data/brick1/gv0
>> >> Brick3: pod-sjc1-gluster1:/data/brick2/gv0
>> >> Brick4: pod-sjc1-gluster2:/data/brick2/gv0
>> >> Brick5: pod-sjc1-gluster1:/data/brick3/gv0
>> >> Brick6: pod-sjc1-gluster2:/data/brick3/gv0
>> >> Options Reconfigured:
>> >> performance.cache-refresh-timeout: 60
>> >> performance.stat-prefetch: on
>> >> server.allow-insecure: on
>> >> performance.flush-behind: on
>> >> performance.rda-cache-limit: 32MB
>> >> network.tcp-window-size: 1048576
>> >> performance.nfs.io-threads: on
>> >> performance.write-behind-window-size: 4MB
>> >> performance.nfs.write-behind-window-size: 512MB
>> >> performance.io-cache: on
>> >> performance.quick-read: on
>> >> features.cache-invalidation: on
>> >> features.cache-invalidation-timeout: 600
>> >> performance.cache-invalidation: on
>> >> performance.md-cache-timeout: 600
>> >> network.inode-lru-limit: 90000
>> >> performance.cache-size: 4GB
>> >> server.event-threads: 16
>> >> client.event-threads: 16
>> >> features.barrier: disable
>> >> transport.address-family: inet
>> >> nfs.disable: on
>> >> performance.client-io-threads: on
>> >> cluster.lookup-optimize: on
>> >> server.outstanding-rpc-limit: 1024
>> >> auto-delete: enable
>> >>
>> >>
>> >> # gluster volume status
>> >> Status of volume: gv0
>> >> Gluster process                             TCP Port  RDMA
Port  Online
>> >> Pid
>> >>
>> >>
>> >>
------------------------------------------------------------------------------
>> >> Hot Bricks:
>> >> Brick pod-sjc1-gluster2:/data/
>> >> hot_tier/gv0                                49219     0       
Y
>> >> 26714
>> >> Brick pod-sjc1-gluster1:/data/
>> >> hot_tier/gv0                                49199     0       
Y
>> >> 21325
>> >> Cold Bricks:
>> >> Brick pod-sjc1-gluster1:/data/
>> >> brick1/gv0                                  49152     0       
Y
>> >> 3178
>> >> Brick pod-sjc1-gluster2:/data/
>> >> brick1/gv0                                  49152     0       
Y
>> >> 4818
>> >> Brick pod-sjc1-gluster1:/data/
>> >> brick2/gv0                                  49153     0       
Y
>> >> 3186
>> >> Brick pod-sjc1-gluster2:/data/
>> >> brick2/gv0                                  49153     0       
Y
>> >> 4829
>> >> Brick pod-sjc1-gluster1:/data/
>> >> brick3/gv0                                  49154     0       
Y
>> >> 3194
>> >> Brick pod-sjc1-gluster2:/data/
>> >> brick3/gv0                                  49154     0       
Y
>> >> 4840
>> >> Tier Daemon on localhost                    N/A       N/A     
Y
>> >> 20313
>> >> Self-heal Daemon on localhost               N/A       N/A     
Y
>> >> 32023
>> >> Tier Daemon on pod-sjc1-gluster1            N/A       N/A     
Y
>> >> 24758
>> >> Self-heal Daemon on pod-sjc1-gluster2       N/A       N/A     
Y
>> >> 12349
>> >>
>> >> Task Status of Volume gv0
>> >>
>> >>
>> >>
------------------------------------------------------------------------------
>> >> There are no active volume tasks
>> >>
>> >>
>> >> On Tue, Jan 9, 2018 at 10:33 PM, Hari Gowtham <hgowtham at
redhat.com>
>> >> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> Can you send the volume info, and volume status output and
the tier
>> >>> logs.
>> >>> And I need to know the size of the files that are being
stored.
>> >>>
>> >>> On Tue, Jan 9, 2018 at 9:51 PM, Tom Fite <tomfite at
gmail.com> wrote:
>> >>> > I've recently enabled an SSD backed 2 TB hot tier
on my 150 TB 2
>> >>> > server
>> >>> > / 3
>> >>> > bricks per server distributed replicated volume.
>> >>> >
>> >>> > I'm seeing IO get blocked across all client FUSE
threads for 10 to
>> >>> > 15
>> >>> > seconds while the promotion daemon runs. I see the
'glustertierpro'
>> >>> > thread
>> >>> > jump to 99% CPU usage on both boxes when these delays
occur and they
>> >>> > happen
>> >>> > every 25 minutes (my tier-promote-frequency setting).
>> >>> >
>> >>> > I suspect this has something to do with the heat
database in sqlite,
>> >>> > maybe
>> >>> > something is getting locked while it runs the query
to determine
>> >>> > files
>> >>> > to
>> >>> > promote. My volume contains approximately 18 million
files.
>> >>> >
>> >>> > Has anybody else seen this? I suspect that these
delays will get
>> >>> > worse
>> >>> > as I
>> >>> > add more files to my volume which will cause
significant problems.
>> >>> >
>> >>> > Here are my hot tier settings:
>> >>> >
>> >>> > # gluster volume get gv0 all | grep tier
>> >>> > cluster.tier-pause                      off
>> >>> > cluster.tier-promote-frequency          1500
>> >>> > cluster.tier-demote-frequency           3600
>> >>> > cluster.tier-mode                       cache
>> >>> > cluster.tier-max-promote-file-size      10485760
>> >>> > cluster.tier-max-mb                     64000
>> >>> > cluster.tier-max-files                  100000
>> >>> > cluster.tier-query-limit                100
>> >>> > cluster.tier-compact                    on
>> >>> > cluster.tier-hot-compact-frequency      86400
>> >>> > cluster.tier-cold-compact-frequency     86400
>> >>> >
>> >>> > # gluster volume get gv0 all | grep threshold
>> >>> > cluster.write-freq-threshold            2
>> >>> > cluster.read-freq-threshold             5
>> >>> >
>> >>> > # gluster volume get gv0 all | grep watermark
>> >>> > cluster.watermark-hi                    92
>> >>> > cluster.watermark-low                   75
>> >>> >
>> >>> > _______________________________________________
>> >>> > Gluster-users mailing list
>> >>> > Gluster-users at gluster.org
>> >>> >
http://lists.gluster.org/mailman/listinfo/gluster-users
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Regards,
>> >>> Hari Gowtham.
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Regards,
>> Hari Gowtham.
>
>


-- 
Regards,
Hari Gowtham.

Seemingly Similar Threads

Search for more reasonably related threads

Gluster users - Jan 2018 - Blocking IO when hot tier promotion daemon runs

[Gluster-users] Blocking IO when hot tier promotion daemon runs

[Gluster-users] Blocking IO when hot tier promotion daemon runs

[Gluster-users] Blocking IO when hot tier promotion daemon runs

Seemingly Similar Threads