Scott Hazelhurst
2013-Mar-15 11:49 UTC
[Gluster-users] gluster fs hangs on certain operations
Dear all We periodically run into serious problems where certain operations cause hanging. For example, doing an ls on a directory. This is a recurrent problem and serious enough for the feasibility of what we are doing We are running gluster 3.3.1 on SL 6.3. The bricks are formatted ext3 Our configuration is Volume Name: A01 Type: Distributed-Replicate Volume ID: dc0f100f-9e25-4559-9e38-4b14c66ed490 Status: Started Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: The problem does seem to related to heal-fail. When I do a ask to see a heal info I get the following which seems to indicate repeated attempt to self-heal [root at n05 ~]# gluster volume heal A01 info heal-failed Gathering Heal info on volume A01 has been successful Brick n01:/export/brickA01_1 Number of entries: 0 Brick n03:/export/brickA01_1 Number of entries: 0 Brick n112:/export/brickA01_1 Number of entries: 0 Brick n113:/export/brickA01_1 Number of entries: 0 Brick n105:/export/brickA01_1 Number of entries: 57 at path on brick ----------------------------------- 2013-03-15 13:39:05 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 13:29:05 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 13:29:05 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 13:19:05 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 13:19:05 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 13:09:04 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 13:09:04 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 12:59:04 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 12:59:04 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 12:49:04 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 12:49:04 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 12:39:04 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 12:39:04 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 12:29:04 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 12:29:04 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 12:19:04 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 12:19:04 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 12:09:04 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 12:09:04 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 11:59:04 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 11:59:04 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 11:49:04 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 11:49:04 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 11:39:03 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 11:39:03 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 11:29:03 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 11:29:03 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 11:19:03 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 11:19:03 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 11:09:03 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 11:09:03 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 10:59:03 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 10:59:03 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 10:49:03 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 10:49:03 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 10:39:03 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 10:39:03 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 10:29:03 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 10:29:03 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 10:19:03 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 10:19:03 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 10:09:03 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 10:09:03 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 09:59:03 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 09:59:03 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 09:49:03 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 09:49:03 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 09:39:02 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 09:39:02 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 09:29:02 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 09:29:02 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 09:19:02 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 09:19:02 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 09:09:02 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 09:09:02 /magd/1k/2013/1000-all_CLEAN.fam 2013-03-15 08:59:02 /magd/1k/2013/1000-all_CLEAN.bim 2013-03-15 08:59:02 /magd/1k/2013/1000-all_CLEAN.fam The log files lists requests to do heal. I have looked at the underlying bricks where the files are and they seem fine. Any help would be gratefully received. Many thanks Scott <table width="100%" border="0" cellspacing="0" cellpadding="0" style="width:100%;"> <tr> <td align="left" style="text-align:justify;"><font face="arial,sans-serif" size="1" color="#999999"><span style="font-size:11px;">This communication is intended for the addressee only. It is confidential. If you have received this communication in error, please notify us immediately and destroy the original message. You may not copy or disseminate this communication without the permission of the University. Only authorised signatories are competent to enter into agreements on behalf of the University and recipients are thus advised that the content of this message may not be legally binding on the University and may contain the personal views and opinions of the author, which are not necessarily the views and opinions of The University of the Witwatersrand, Johannesburg. All agreements between the University and outsiders are subject to South African Law unless the University agrees in writing to the contrary. </span></font></td> </tr> </table -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130315/618338f9/attachment.html>
Scott Hazelhurst
2013-Mar-17 13:39 UTC
[Gluster-users] gluster fs hangs on certain operations
To respond to myself: after about 18 hours, self-heal seemed to finish. I then unmounted the gluster directory on all machines and restarted the gluster daemon and it all seems fine. This is the second time this has happened to us in about six weeks. The first time was after one of the servers was down for a few days, but Friday's incident seemed to be provoked by heavy usage of the file system and not by any disruption to the network or service. In the previous incident, there weren't any suggestions of problems either. On 15 Mar 2013, at 1:49 PM, Scott Hazelhurst wrote:> > Dear all > > We periodically run into serious problems where certain operations cause hanging. For example, doing an ls on a directory. This is a recurrent problem and serious enough for the feasibility of what we are doing > > We are running gluster 3.3.1 on SL 6.3. The bricks are formatted ext3 > > Our configuration is > > Volume Name: A01 > Type: Distributed-Replicate > Volume ID: dc0f100f-9e25-4559-9e38-4b14c66ed490 > Status: Started > Number of Bricks: 3 x 2 = 6 > Transport-type: tcp > Bricks: > > > The problem does seem to related to heal-fail. When I do a ask to see a heal info I get the following which seems to indicate repeated attempt to self-heal > > > [root at n05 ~]# gluster volume heal A01 info heal-failed > Gathering Heal info on volume A01 has been successful > > Brick n01:/export/brickA01_1 > Number of entries: 0 > > Brick n03:/export/brickA01_1 > Number of entries: 0 > > Brick n112:/export/brickA01_1 > Number of entries: 0 > > Brick n113:/export/brickA01_1 > Number of entries: 0 > > Brick n105:/export/brickA01_1 > Number of entries: 57 > at path on brick > ----------------------------------- > 2013-03-15 13:39:05 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 13:29:05 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 13:29:05 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 13:19:05 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 13:19:05 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 13:09:04 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 13:09:04 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 12:59:04 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 12:59:04 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 12:49:04 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 12:49:04 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 12:39:04 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 12:39:04 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 12:29:04 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 12:29:04 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 12:19:04 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 12:19:04 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 12:09:04 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 12:09:04 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 11:59:04 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 11:59:04 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 11:49:04 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 11:49:04 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 11:39:03 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 11:39:03 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 11:29:03 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 11:29:03 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 11:19:03 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 11:19:03 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 11:09:03 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 11:09:03 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 10:59:03 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 10:59:03 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 10:49:03 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 10:49:03 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 10:39:03 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 10:39:03 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 10:29:03 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 10:29:03 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 10:19:03 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 10:19:03 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 10:09:03 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 10:09:03 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 09:59:03 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 09:59:03 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 09:49:03 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 09:49:03 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 09:39:02 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 09:39:02 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 09:29:02 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 09:29:02 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 09:19:02 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 09:19:02 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 09:09:02 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 09:09:02 /magd/1k/2013/1000-all_CLEAN.fam > 2013-03-15 08:59:02 /magd/1k/2013/1000-all_CLEAN.bim > 2013-03-15 08:59:02 /magd/1k/2013/1000-all_CLEAN.fam > > The log files lists requests to do heal. > > I have looked at the underlying bricks where the files are and they seem fine. > > Any help would be gratefully received. > > Many thanks > > Scott > > This communication is intended for the addressee only. It is confidential. If you have received this communication in error, please notify us immediately and destroy the original message. You may not copy or disseminate this communication without the permission of the University. Only authorised signatories are competent to enter into agreements on behalf of the University and recipients are thus advised that the content of this message may not be legally binding on the University and may contain the personal views and opinions of the author, which are not necessarily the views and opinions of The University of the Witwatersrand, Johannesburg. All agreements between the University and outsiders are subject to South African Law unless the University agrees in writing to the contrary. > <> > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users<table width="100%" border="0" cellspacing="0" cellpadding="0" style="width:100%;"> <tr> <td align="left" style="text-align:justify;"><font face="arial,sans-serif" size="1" color="#999999"><span style="font-size:11px;">This communication is intended for the addressee only. It is confidential. If you have received this communication in error, please notify us immediately and destroy the original message. You may not copy or disseminate this communication without the permission of the University. Only authorised signatories are competent to enter into agreements on behalf of the University and recipients are thus advised that the content of this message may not be legally binding on the University and may contain the personal views and opinions of the author, which are not necessarily the views and opinions of The University of the Witwatersrand, Johannesburg. All agreements between the University and outsiders are subject to South African Law unless the University agrees in writing to the contrary. </span></font></td> </tr> </table
* Scott Hazelhurst <Scott.Hazelhurst at wits.ac.za> [2013 03 15, 11:49]:> We are running gluster 3.3.1 on SL 6.3. The bricks are formatted ext3http://joejulian.name/blog/glusterfs-bit-by-ext4-structure-change/ ? Maybe the solution is "use xfs". Regards