resending with parsed logs...> > >>I am having issues with 3.6.6 where the load will spike up to 800% for >>one of the glusterfsd processes and the users can no longer access the >>system. If I reboot the node, the heal will finish normally after a >>few minutes and the system will be responsive, but a few hours later >>the issue will start again. It look like it is hanging in a heal and >>spinning up the load on one of the bricks. The heal gets stuck and >>says it is crawling and never returns. After a few minutes of the >>heal saying it is crawling, the load spikes up and the mounts become >>unresponsive. >> >>Any suggestions on how to fix this? It has us stopped cold as the >>user can no longer access the systems when the load spikes... Logs >>attached. >> >>System setup info is: >> >>[root at gfs01a ~]# gluster volume info homegfs >> >>Volume Name: homegfs >>Type: Distributed-Replicate >>Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071 >>Status: Started >>Number of Bricks: 4 x 2 = 8 >>Transport-type: tcp >>Bricks: >>Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs >>Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs >>Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs >>Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs >>Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs >>Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs >>Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs >>Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs >>Options Reconfigured: >>performance.io-thread-count: 32 >>performance.cache-size: 128MB >>performance.write-behind-window-size: 128MB >>server.allow-insecure: on >>network.ping-timeout: 42 >>storage.owner-gid: 100 >>geo-replication.indexing: off >>geo-replication.ignore-pid-check: on >>changelog.changelog: off >>changelog.fsync-interval: 3 >>changelog.rollover-time: 15 >>server.manage-gids: on >>diagnostics.client-log-level: WARNING >> >>[root at gfs01a ~]# rpm -qa | grep gluster >>gluster-nagios-common-0.1.1-0.el6.noarch >>glusterfs-fuse-3.6.6-1.el6.x86_64 >>glusterfs-debuginfo-3.6.6-1.el6.x86_64 >>glusterfs-libs-3.6.6-1.el6.x86_64 >>glusterfs-geo-replication-3.6.6-1.el6.x86_64 >>glusterfs-api-3.6.6-1.el6.x86_64 >>glusterfs-devel-3.6.6-1.el6.x86_64 >>glusterfs-api-devel-3.6.6-1.el6.x86_64 >>glusterfs-3.6.6-1.el6.x86_64 >>glusterfs-cli-3.6.6-1.el6.x86_64 >>glusterfs-rdma-3.6.6-1.el6.x86_64 >>samba-vfs-glusterfs-4.1.11-2.el6.x86_64 >>glusterfs-server-3.6.6-1.el6.x86_64 >>glusterfs-extra-xlators-3.6.6-1.el6.x86_64 >> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160120/34032f34/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: glusterfs-log.tgz Type: application/x-compressed Size: 880609 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160120/34032f34/attachment-0001.bin>
Pranith Kumar Karampuri
2016-Jan-21 05:01 UTC
[Gluster-users] [Gluster-devel] heal hanging
hey, Which process is consuming so much cpu? I went through the logs you gave me. I see that the following files are in gfid mismatch state: <066e4525-8f8b-43aa-b7a1-86bbcecc68b9/safebrowsing-backup>, <1d48754b-b38c-403d-94e2-0f5c41d5f885/recovery.bak>, <ddc92637-303a-4059-9c56-ab23b1bb6ae9/patch0008.cnvrg>, Could you give me the output of "ls <brick-path>/indices/xattrop | wc -l" output on all the bricks which are acting this way? This will tell us the number of pending self-heals on the system. Pranith On 01/20/2016 09:26 PM, David Robinson wrote:> resending with parsed logs... >>> I am having issues with 3.6.6 where the load will spike up to 800% >>> for one of the glusterfsd processes and the users can no longer >>> access the system. If I reboot the node, the heal will finish >>> normally after a few minutes and the system will be responsive, >>> but a few hours later the issue will start again. It look like it >>> is hanging in a heal and spinning up the load on one of the bricks. >>> The heal gets stuck and says it is crawling and never returns. >>> After a few minutes of the heal saying it is crawling, the load >>> spikes up and the mounts become unresponsive. >>> Any suggestions on how to fix this? It has us stopped cold as the >>> user can no longer access the systems when the load spikes... Logs >>> attached. >>> System setup info is: >>> [root at gfs01a ~]# gluster volume info homegfs >>> >>> Volume Name: homegfs >>> Type: Distributed-Replicate >>> Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071 >>> Status: Started >>> Number of Bricks: 4 x 2 = 8 >>> Transport-type: tcp >>> Bricks: >>> Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs >>> Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs >>> Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs >>> Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs >>> Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs >>> Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs >>> Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs >>> Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs >>> Options Reconfigured: >>> performance.io-thread-count: 32 >>> performance.cache-size: 128MB >>> performance.write-behind-window-size: 128MB >>> server.allow-insecure: on >>> network.ping-timeout: 42 >>> storage.owner-gid: 100 >>> geo-replication.indexing: off >>> geo-replication.ignore-pid-check: on >>> changelog.changelog: off >>> changelog.fsync-interval: 3 >>> changelog.rollover-time: 15 >>> server.manage-gids: on >>> diagnostics.client-log-level: WARNING >>> [root at gfs01a ~]# rpm -qa | grep gluster >>> gluster-nagios-common-0.1.1-0.el6.noarch >>> glusterfs-fuse-3.6.6-1.el6.x86_64 >>> glusterfs-debuginfo-3.6.6-1.el6.x86_64 >>> glusterfs-libs-3.6.6-1.el6.x86_64 >>> glusterfs-geo-replication-3.6.6-1.el6.x86_64 >>> glusterfs-api-3.6.6-1.el6.x86_64 >>> glusterfs-devel-3.6.6-1.el6.x86_64 >>> glusterfs-api-devel-3.6.6-1.el6.x86_64 >>> glusterfs-3.6.6-1.el6.x86_64 >>> glusterfs-cli-3.6.6-1.el6.x86_64 >>> glusterfs-rdma-3.6.6-1.el6.x86_64 >>> samba-vfs-glusterfs-4.1.11-2.el6.x86_64 >>> glusterfs-server-3.6.6-1.el6.x86_64 >>> glusterfs-extra-xlators-3.6.6-1.el6.x86_64 > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160121/45a3f63e/attachment.html>