Tom Fite
2018-Jan-09 16:21 UTC
[Gluster-users] Blocking IO when hot tier promotion daemon runs
I've recently enabled an SSD backed 2 TB hot tier on my 150 TB 2 server / 3 bricks per server distributed replicated volume. I'm seeing IO get blocked across all client FUSE threads for 10 to 15 seconds while the promotion daemon runs. I see the 'glustertierpro' thread jump to 99% CPU usage on both boxes when these delays occur and they happen every 25 minutes (my tier-promote-frequency setting). I suspect this has something to do with the heat database in sqlite, maybe something is getting locked while it runs the query to determine files to promote. My volume contains approximately 18 million files. Has anybody else seen this? I suspect that these delays will get worse as I add more files to my volume which will cause significant problems. Here are my hot tier settings: # gluster volume get gv0 all | grep tier cluster.tier-pause off cluster.tier-promote-frequency 1500 cluster.tier-demote-frequency 3600 cluster.tier-mode cache cluster.tier-max-promote-file-size 10485760 cluster.tier-max-mb 64000 cluster.tier-max-files 100000 cluster.tier-query-limit 100 cluster.tier-compact on cluster.tier-hot-compact-frequency 86400 cluster.tier-cold-compact-frequency 86400 # gluster volume get gv0 all | grep threshold cluster.write-freq-threshold 2 cluster.read-freq-threshold 5 # gluster volume get gv0 all | grep watermark cluster.watermark-hi 92 cluster.watermark-low 75 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180109/30a79b5b/attachment.html>
Hari Gowtham
2018-Jan-10 03:33 UTC
[Gluster-users] Blocking IO when hot tier promotion daemon runs
Hi, Can you send the volume info, and volume status output and the tier logs. And I need to know the size of the files that are being stored. On Tue, Jan 9, 2018 at 9:51 PM, Tom Fite <tomfite at gmail.com> wrote:> I've recently enabled an SSD backed 2 TB hot tier on my 150 TB 2 server / 3 > bricks per server distributed replicated volume. > > I'm seeing IO get blocked across all client FUSE threads for 10 to 15 > seconds while the promotion daemon runs. I see the 'glustertierpro' thread > jump to 99% CPU usage on both boxes when these delays occur and they happen > every 25 minutes (my tier-promote-frequency setting). > > I suspect this has something to do with the heat database in sqlite, maybe > something is getting locked while it runs the query to determine files to > promote. My volume contains approximately 18 million files. > > Has anybody else seen this? I suspect that these delays will get worse as I > add more files to my volume which will cause significant problems. > > Here are my hot tier settings: > > # gluster volume get gv0 all | grep tier > cluster.tier-pause off > cluster.tier-promote-frequency 1500 > cluster.tier-demote-frequency 3600 > cluster.tier-mode cache > cluster.tier-max-promote-file-size 10485760 > cluster.tier-max-mb 64000 > cluster.tier-max-files 100000 > cluster.tier-query-limit 100 > cluster.tier-compact on > cluster.tier-hot-compact-frequency 86400 > cluster.tier-cold-compact-frequency 86400 > > # gluster volume get gv0 all | grep threshold > cluster.write-freq-threshold 2 > cluster.read-freq-threshold 5 > > # gluster volume get gv0 all | grep watermark > cluster.watermark-hi 92 > cluster.watermark-low 75 > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users-- Regards, Hari Gowtham.
Tom Fite
2018-Jan-10 15:17 UTC
[Gluster-users] Blocking IO when hot tier promotion daemon runs
The sizes of the files are extremely varied, there are millions of small (<1 MB) files and thousands of files larger than 1 GB. Attached is the tier log for gluster1 and gluster2. These are full of "demotion failed" messages, which is also shown in the status: [root at pod-sjc1-gluster1 gv0]# gluster volume tier gv0 status Node Promoted files Demoted files Status run time in h:m:s --------- --------- --------- --------- --------- localhost 25940 0 in progress 112:21:49 pod-sjc1-gluster2 0 2917154 in progress 112:21:49 Is it normal to have promotions and demotions only happen on each server but not both? Volume info: [root at pod-sjc1-gluster1 ~]# gluster volume info Volume Name: gv0 Type: Distributed-Replicate Volume ID: d490a9ec-f9c8-4f10-a7f3-e1b6d3ced196 Status: Started Snapshot Count: 13 Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: pod-sjc1-gluster1:/data/brick1/gv0 Brick2: pod-sjc1-gluster2:/data/brick1/gv0 Brick3: pod-sjc1-gluster1:/data/brick2/gv0 Brick4: pod-sjc1-gluster2:/data/brick2/gv0 Brick5: pod-sjc1-gluster1:/data/brick3/gv0 Brick6: pod-sjc1-gluster2:/data/brick3/gv0 Options Reconfigured: performance.cache-refresh-timeout: 60 performance.stat-prefetch: on server.allow-insecure: on performance.flush-behind: on performance.rda-cache-limit: 32MB network.tcp-window-size: 1048576 performance.nfs.io-threads: on performance.write-behind-window-size: 4MB performance.nfs.write-behind-window-size: 512MB performance.io-cache: on performance.quick-read: on features.cache-invalidation: on features.cache-invalidation-timeout: 600 performance.cache-invalidation: on performance.md-cache-timeout: 600 network.inode-lru-limit: 90000 performance.cache-size: 4GB server.event-threads: 16 client.event-threads: 16 features.barrier: disable transport.address-family: inet nfs.disable: on performance.client-io-threads: on cluster.lookup-optimize: on server.outstanding-rpc-limit: 1024 auto-delete: enable # gluster volume status Status of volume: gv0 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------ ------------------ Hot Bricks: Brick pod-sjc1-gluster2:/data/ hot_tier/gv0 49219 0 Y 26714 Brick pod-sjc1-gluster1:/data/ hot_tier/gv0 49199 0 Y 21325 Cold Bricks: Brick pod-sjc1-gluster1:/data/ brick1/gv0 49152 0 Y 3178 Brick pod-sjc1-gluster2:/data/ brick1/gv0 49152 0 Y 4818 Brick pod-sjc1-gluster1:/data/ brick2/gv0 49153 0 Y 3186 Brick pod-sjc1-gluster2:/data/ brick2/gv0 49153 0 Y 4829 Brick pod-sjc1-gluster1:/data/ brick3/gv0 49154 0 Y 3194 Brick pod-sjc1-gluster2:/data/ brick3/gv0 49154 0 Y 4840 Tier Daemon on localhost N/A N/A Y 20313 Self-heal Daemon on localhost N/A N/A Y 32023 Tier Daemon on pod-sjc1-gluster1 N/A N/A Y 24758 Self-heal Daemon on pod-sjc1-gluster2 N/A N/A Y 12349 Task Status of Volume gv0 ------------------------------------------------------------ ------------------ There are no active volume tasks On Tue, Jan 9, 2018 at 10:33 PM, Hari Gowtham <hgowtham at redhat.com> wrote:> Hi, > > Can you send the volume info, and volume status output and the tier logs. > And I need to know the size of the files that are being stored. > > On Tue, Jan 9, 2018 at 9:51 PM, Tom Fite <tomfite at gmail.com> wrote: > > I've recently enabled an SSD backed 2 TB hot tier on my 150 TB 2 server > / 3 > > bricks per server distributed replicated volume. > > > > I'm seeing IO get blocked across all client FUSE threads for 10 to 15 > > seconds while the promotion daemon runs. I see the 'glustertierpro' > thread > > jump to 99% CPU usage on both boxes when these delays occur and they > happen > > every 25 minutes (my tier-promote-frequency setting). > > > > I suspect this has something to do with the heat database in sqlite, > maybe > > something is getting locked while it runs the query to determine files to > > promote. My volume contains approximately 18 million files. > > > > Has anybody else seen this? I suspect that these delays will get worse > as I > > add more files to my volume which will cause significant problems. > > > > Here are my hot tier settings: > > > > # gluster volume get gv0 all | grep tier > > cluster.tier-pause off > > cluster.tier-promote-frequency 1500 > > cluster.tier-demote-frequency 3600 > > cluster.tier-mode cache > > cluster.tier-max-promote-file-size 10485760 > > cluster.tier-max-mb 64000 > > cluster.tier-max-files 100000 > > cluster.tier-query-limit 100 > > cluster.tier-compact on > > cluster.tier-hot-compact-frequency 86400 > > cluster.tier-cold-compact-frequency 86400 > > > > # gluster volume get gv0 all | grep threshold > > cluster.write-freq-threshold 2 > > cluster.read-freq-threshold 5 > > > > # gluster volume get gv0 all | grep watermark > > cluster.watermark-hi 92 > > cluster.watermark-low 75 > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Regards, > Hari Gowtham. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180110/9c4538b4/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: gluster2-tierd.log Type: application/octet-stream Size: 1979348 bytes Desc: not available URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180110/9c4538b4/attachment-0002.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: gluster1-tierd.log Type: application/octet-stream Size: 1970058 bytes Desc: not available URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180110/9c4538b4/attachment-0003.obj>